State of Open Source Workflow Orchestration Systems 2025
Overview of Major 2024 Trends and Emerging Technologies Shaping 2025
This is the fifth part in the Data Landscape Trends 2024-2025 series, focusing on the state of the open source workflow orchestration systems.
Introduction
In the rapidly evolving landscape of data engineering, workflow orchestration engines play a key role in managing complex data processes.
This analysis explores the current state of workflow orchestration engines through multiple lenses such as community engagement, technical architecture, adoption metrics, and emerging innovations in 2024.
We'll cover the following topics:
Current open source workflow orchestration landscape
Open Source vs Open Core engines
Task-centric vs Data-centric engines
GitHub repository trends in 2024
Summary and analysis of the current state of the products
Major 2024 workflow orchestration trends
Recommendations and conclusion
Current OSS Landscape and Major Products
The workflow orchestration category stands out as one of the most dynamic segments of the open source data engineering ecosystem.
It features over 10 active projects that range from established products like Apache Airflow to newly open sourced engines like Netflix's Maestro.
The evolution of major open source workflow orchestration engines traces back to 2008, when Yahoo developed the first significant workflow engine Oozie to address the growing complexity of managing workloads on the Hadoop platform.
Since then, the industry has developed numerous orchestration systems to meet the growing demands of workload management and orchestration on data platforms.
Some projects, such as Orchest, have been come and gone, and are no longer maintained. Such retired projects are excluded from this analysis.
The timeline below illustrates the development progression of major open source workflow orchestration engines, highlighting both their initial open source releases and subsequent donations to the open source foundation where applicable.
I remember back in 2018 when we had to pick a workflow engine for a new large-scale data platform. Our options were pretty much just Luigi, Azkaban, and Airflow.
The choice was really simple back then - Airflow was a clear winner since it was Python-based and had great features. But nowadays it's so much harder to navigate this landscape and do a proper comparison of architectures and features between all the available tools in the market.
Netflix's New Contribution
An exciting development in this ecosystem came when Netflix open sourced their next-generation orchestrator, Maestro, in July 2024.
Introduced via their tech blog, Maestro is designed as a highly scalable and flexible scheduler capable of handling large-scale heterogeneous workflows like ML training and data pipelines.
What makes Maestro stand out is its flexible execution support for Docker images and notebooks, along with its ability to handle both cyclic and acyclic (DAG) workflow patterns.
Since its July release, Maestro has gained notable traction in the community. However, the repository has seen limited code activity since the initial release.
Back-end Language
In terms of back-end languages, these tools have a fairly even distribution between Java, Go, and Python, with the exception of Windmill, which is built using the rising Rust language.
Open Source vs Open Core Engines
It's important to note that not all these projects are truly open source. Some follow an "open core" model instead, where the main SaaS provider only releases certain core components as open source while keeping premium features such as monitoring and security, proprietary.
When evaluating these tools for adoption, it's crucial to assess how portable and genuinely open source each project really is, as this can impact long-term sustainability and cost.
Many current open core tools like Kestra and Dagster keep essential enterprise features – especially security features like SSO – locked behind their enterprise versions. This is a deliberate strategy to monetise enterprise clients who need these capabilities.
This approach creates a significant problem for OSS adoptions: businesses that care about security and governance can't realistically use the open source versions of these products.
Open Core users frequently complain about this limitation, particularly the lack of basic authentication and authorisation mechanisms in the open core versions.
Currently, only projects like Apache Airflow, Flyte and Apache DolphinScheduler are guaranteed to remain fully open source, as they're not owned by any single commercial entity but rather governed by an open source Foundation.
Task-Centric vs Data-Centric Engines
Workflow orchestration engines can be broadly classified by their fundamental approach to workflow management: task-centric versus data-centric, alongside other categories like declarative vs code-based and batch vs event-driven.
Task-Centric Orchestrators
Airflow, Luigi, Cadence, and Kestra exemplify the task-centric approach, organising workflows as Directed Acyclic Graphs (DAGs) of interconnected tasks.
In these engines, the task is the primary unit of work, capable of executing any type of operation. The scheduler's main concern is managing control flow and dependencies between tasks within the DAG, remaining largely agnostic to the actual work being performed.
Data-Centric Orchestrators
Engines like Dagster, Temporal, and Flyte take a more opinionated, data-centric approach. In these engines, data-oriented objects (or "assets" in Dagster's terminology) serves as the primary focus of the workflow.
They treat workflows as data-aware pipelines where assets - whether tables, files, ML models, or dbt models - are produced, consumed, and transformed.
Data-centric engines provide native support for passing data between tasks and offer superior integration with modern data transformation frameworks like dbt and SQLMesh, compared to task-centric engines.
GitHub Repository Trends
Open source projects are typically evaluated through key metrics including GitHub stars, download counts, contributor activity, and repository engagement (measured by commits, releases, and issue resolution rates).
As part of my commitment to understanding the open source ecosystem, I've developed my own small analytical platform that tracks and analyses all GitHub events for public repositories. The following metrics and trends for 2024 are derived from this platform.
Project Popularity
Looking at GitHub repository star trends in 2024, Kestra has emerged as a rising workflow orchestration project. The graph below shows a spike in September, when Kestra surpassed all other projects in new stars gained in 2024.
This surge is directly linked to Kestra's $8M funding announcement, which was featured in TechCrunch. It's a clear example of how repository stars can spike in response to major company announcements.
The well-established Apache Airflow and Prefect ranked the second and third most-starred workflow projects in 2024 respectively.
Code Activity
Code activity in open source projects can be measured by two main metrics: pull requests (opened, closed, and reviewed) and commit volume (push events).
For 2024, Dagster and Airflow led the pack in pull request activity, each processing over 10K PRs from their contributors, with Prefect following close behind.
On the other end of the spectrum, projects like Cadence, Luigi, Maestro, and Azkaban showed concerning levels of inactivity, raising questions about their long-term health.
Looking at commit volume, Dagster demonstrated remarkable development activity with an impressive 27K commits in 2024. Prefect and Windmill also showed strong development momentum, each recording over 10K commits.
Project Collaboration
The health and sustainability of an open source project largely depends on its contributor base – the wider and more diverse, the better.
When evaluating contributor metrics, it's crucial to distinguish between active contributors who consistently work throughout the year and one-off contributors who make occasional submissions. Active contributors provide a more meaningful measure of project health.
Looking at active contributors in 2024, Airflow and Dagster lead the ecosystem with over 20 active contributors each. Any major open source project with few (ex < 5) active contributors raises sustainability concerns. By this metric, projects like Argo Workflows, Mage-ai, DolphinScheduler, and Flyte fall into a warning zone.
At the concerning end of the spectrum, projects like Luigi and Azkaban showed no active contributions throughout the year.
Community Engagement
Community engagement can be measured through several indicators: issues logged, comment volume, and participation in official community channels like Slack and discussion boards. These metrics help determine how vibrant and active a tool's community really is.
Another key metric is the ratio of closed to opened issues, which indicates how quickly project maintainers address community-reported problems.
Looking at GitHub activity in terms of total issues opened and closed, Airflow, Kestra, Prefect, and DolphinScheduler show the strongest community engagement.
Based on total issues registered, We can consider fewer than 100 issues or less than 50% issue resolution a concern, and fewer than 50 issues a danger zone. Again, projects like Luigi, Azkaban, and Cadence fall into this danger zone, suggesting minimal community interaction.
Downloads & Installations
Most open source orchestration tools are either Python-based or provide Python client and SDKs, making PyPI download statistics a useful metric for measuring adoption and popularity.
Looking at the download stats from clickpy.clickhouse.com, Apache Airflow dominates the ecosystem with a staggering 320M downloads in 2024 alone - ten times more than its nearest competitor. This reinforces Airflow's position as the leading tool in the entire data engineering ecosystem.
Prefect and Dagster round out the top three most downloaded packages in 2024, with 32M and 15M downloads respectively.
An interesting observation: despite being an inactive project, Luigi recorded 5.6M downloads in 2024. This likely reflects existing users updating to minor releases, suggesting a significant legacy user base still relies on the platform.
Summary & Analysis
Here is the summary of the evaluation of the workflow orchestration engines across key GitHub metrics:
Advancing Projects
After a decade, Airflow remains the dominant force in open source orchestration, maintaining the most active and vibrant open source project in the market.
Dagster is likely the second most popular orchestrator in 2024. Along with Prefect and Temporal, it's capturing significant market attention, particularly among startups and smaller-scale deployments.
These tools stand out for their simplified approach to data-centric workflow management, more intuitive UIs, and enhanced support for event-driven workflows.
Rising Projects
Kestra has become one of the fastest-growing orchestration tools in 2024, gaining momentum after securing $8M in funding. The project has also been praised for its simplicity, declarative YAML-based workflow definitions, and support for event-driven workflows.
Declining Projects
Legacy tools Luigi and Azkaban rank at the bottom across all metrics. While neither project has been officially archived or retired, their lack of meaningful development activity in 2024 effectively marks them as inactive.
Luigi saw only minor bug fixes throughout the year, while Azkaban showed no code activity whatsoever. This dramatic decline in maintenance suggests these once-popular orchestrators have reached the end of their active lifecycle.
The future of Netflix's Maestro remains uncertain. 2025 will be a pivotal year, revealing whether the project gains momentum on GitHub or follows the path of some other abandoned in-house tools released by tech giants.
2024 OSS Orchestration Competition
Let's turn this into a competition and rank our open source workflow engines!
We'll identify the top three performers across key metrics in 2024, creating a sort of "workflow orchestrator competition."
Based on our OSS metrics and medal counts, Apache Airflow claims the crown as 2024's champion workflow orchestrator, with Dagster taking second and Prefect earning third place.
Important note: This comparison focuses solely on open source activity metrics and community engagement. It should not be interpreted as a judgment of each tool's features, capabilities, or overall ecosystem. The best workflow orchestrator for your needs will depend on your specific requirements, use cases, and technical environment.
Major 2024 Trends
Let’s explore the key development trends in the workflow orchestration ecosystem for 2024.
Event-Driven & Real-Time Orchestration
The workflow orchestration ecosystem is shifting toward event-triggered and real-time processing capabilities, reflecting industry's growing demand for real-time workload management.
In 2024, several major products made significant moves in this direction. Kestra introduced Real-time and HTTP Triggers, enabling millisecond-latency responses to events from systems like Kafka and AWS SQS, and over HTTP requests.
Temporal enhanced its real-time capabilities with Workflow Update and Workflow Update-With-Start features, enabling synchronous processing for interactive applications. Meanwhile, DolphinScheduler expanded its event-driven architecture with a variety of new triggers.
Mage focused on real-time data processing, introducing Streaming Pipelines that support real-time ingestion and transformation from sources like Kafka and Google Pub/Sub.
Even Apache Airflow, traditionally a batch-oriented system, has recognised this shift toward real-time processing. Its 2024 updates introduced addition of new conditions for its data-aware scheduling, and new scheduling mechanism which supports scheduling DAGs based on both dataset events and time.
AI/LLM Integration & Automation
The integration of AI and LLM capabilities emerged as another major trend in workflow orchestration during 2024, reflecting the growing role of LLM-based workloads in enterprise data operations.
Prefect made a significant move in this space by launching ControlFlow, a framework specifically designed for AI-driven workflows and LLM integration. Prefect also integrated Marvin, an LLM-powered assistant, to simplify the creation of AI workflows.
Temporal embraced multi-agent workflows, enabling sophisticated coordination between AI models, software applications, and human participants.
Meanwhile, Windmill took a different approach by integrating AI directly into the development experience, introducing an AI copilot to assist in flow building.
Enhanced Resource Management & Execution
Intelligent resource management has become a critical focus for workflow engines, particularly as organisations increasingly run workflows on cloud-managed and serverless platforms. Several cloud-native engines made significant advances in this area during 2024.
Temporal introduced sophisticated resource management with its worker auto-tuning feature, which automatically adjusts worker slots based on real-time CPU and memory usage.
Kestra has introduced task runners that can dynamically offload resource-intensive tasks to on-demand compute services like Azure Batch, Google Batch, and Google Cloud Run.
Dagster Pipes became stable in version 1.8 released in 2024, with enhanced integrations for Lambda, Kubernetes, and Databricks looking ahead.
DolphinScheduler plans to integrate KEDA (Kubernetes Event-Driven Autoscaling), which will enable automatic worker scaling based on workload demands, further enhancing its Kubernetes-native capabilities.
Prefect and Flyte expanded their back-end execution capabilities in 2024 by enhancing support for distributed computing frameworks, integrating with scalable Python execution frameworks such as Ray and Dask, enabling more efficient parallel processing and distributed task execution.
Conclusion & Recommendations
After a decade, Apache Airflow remains the most mature and widely adopted orchestration tool in the data engineering ecosystem. Its position as the market leader is reinforced by major cloud vendors - Google Cloud Composer and Amazon MWAA have both standardised on Airflow for their managed workflow services.
While Airflow faces criticism for its steep learning curve, operational overhead, and not-so friendly UX with outdated UI (though a complete revamp is planned for the upcoming version 3.0), its primary technical limitation is its focus on batch-oriented workflows, with less native support for modern dynamic workflow patterns.
For large-scale deployments managing large number of heterogeneous workflows that require a general-purpose engine with extensive operations support and a large ecosystem, Apache Airflow remains the top choice. At the Airflow Summit 2024, major companies showcased Airflow's massive scalability, with Uber orchestrating 450K pipeline runs daily across 1000 teams, Stripe managing 150K tasks, and LinkedIn operating over 10K parallel DAGs.
For startups, and small to mid-sized businesses consider newer orchestration tools that offer streamlined setup and development experience through features like in-browser development environments, declarative workflow authoring, and low-code capabilities.
For dynamic and data-centric workflow orchestration, products like Prefect and Dagster excel at data-aware orchestration compared to traditional task-based schedulers.