10 Comments
User's avatar
Yuriy Gavrilov's avatar

Thanks 🙏🏻 this is awesome 🔥

Expand full comment
Yuriy Gavrilov's avatar

@Alireza Sadeghi one thing about data fusion this is not only single node. There is ballista exist, but not sure is it active or not

Expand full comment
Alireza Sadeghi's avatar

Yes, Ballista is a distributed DataFusion engine that remains a moderately active project, but with only one or two active contributors. I had already included it on the GitHub page: https://github.com/pracdata/awesome-open-source-data-engineering

Expand full comment
Vu Trinh's avatar

I've been reading your work recently and have become a fan. Your research is incredibly detailed and comprehensive—thank you for all your hard work!

Expand full comment
Alireza Sadeghi's avatar

Thank you for your kind words, I truly enjoy reading and learn from your articles as well!

Expand full comment
Sarwar Bhuiyan's avatar

Thanks for mentioning Proton by Timeplus. However it is first and foremost a stream processing platform (like arroyo, materialize, RisingWave) with row/column storage for query capability of materialized streams natively.

Expand full comment
Jean-Georges Perrin's avatar

Awesome & impressive work. Have you ever considered the Bitol project from the Linux Foundation? It is very complementary to Egeria & OpenLineage.

Expand full comment
lalilu's avatar

there are already many mpp engines,for example:impala、starrocks、clickhouse、doris,THe difference between these engines,and these engines‘s future?

Expand full comment
Alireza Sadeghi's avatar

Please check the links provided in the article for more in-depth analysis of some of the categories like real-time OLAP engines: https://www.pracdata.io/p/state-of-open-source-read-time-olap-2025

Expand full comment
Donald Parish's avatar

Wow, coming from Microsoft Fabric stack. And I thought it had too many choices!

Expand full comment