A comprehensive view of active open source tools and emerging trends in data engineering ecosystem in 2024-2025
Thanks 🙏🏻 this is awesome 🔥
@Alireza Sadeghi one thing about data fusion this is not only single node. There is ballista exist, but not sure is it active or not
Yes, Ballista is a distributed DataFusion engine that remains a moderately active project, but with only one or two active contributors. I had already included it on the GitHub page: https://github.com/pracdata/awesome-open-source-data-engineering
I've been reading your work recently and have become a fan. Your research is incredibly detailed and comprehensive—thank you for all your hard work!
Thank you for your kind words, I truly enjoy reading and learn from your articles as well!
Thanks for mentioning Proton by Timeplus. However it is first and foremost a stream processing platform (like arroyo, materialize, RisingWave) with row/column storage for query capability of materialized streams natively.
Awesome & impressive work. Have you ever considered the Bitol project from the Linux Foundation? It is very complementary to Egeria & OpenLineage.
there are already many mpp engines,for example:impala、starrocks、clickhouse、doris,THe difference between these engines,and these engines‘s future?
Please check the links provided in the article for more in-depth analysis of some of the categories like real-time OLAP engines: https://www.pracdata.io/p/state-of-open-source-read-time-olap-2025
Wow, coming from Microsoft Fabric stack. And I thought it had too many choices!
Thanks 🙏🏻 this is awesome 🔥
@Alireza Sadeghi one thing about data fusion this is not only single node. There is ballista exist, but not sure is it active or not
Yes, Ballista is a distributed DataFusion engine that remains a moderately active project, but with only one or two active contributors. I had already included it on the GitHub page: https://github.com/pracdata/awesome-open-source-data-engineering
I've been reading your work recently and have become a fan. Your research is incredibly detailed and comprehensive—thank you for all your hard work!
Thank you for your kind words, I truly enjoy reading and learn from your articles as well!
Thanks for mentioning Proton by Timeplus. However it is first and foremost a stream processing platform (like arroyo, materialize, RisingWave) with row/column storage for query capability of materialized streams natively.
Awesome & impressive work. Have you ever considered the Bitol project from the Linux Foundation? It is very complementary to Egeria & OpenLineage.
there are already many mpp engines,for example:impala、starrocks、clickhouse、doris,THe difference between these engines,and these engines‘s future?
Please check the links provided in the article for more in-depth analysis of some of the categories like real-time OLAP engines: https://www.pracdata.io/p/state-of-open-source-read-time-olap-2025
Wow, coming from Microsoft Fabric stack. And I thought it had too many choices!