I’m looking into several “transactional data lake” technologies such as Apache Hudi, Delta Lake, AWS Lake Formation Governed Tables.
Except for the latter, I can’t see how these would work in a multi cluster environment. I’m baselining against s3 for storage, and want to incrementally alter my data lake, where I may have many clusters all reading from and writing to the lake at any given time. Is this possible/supported? It seems like the compaction and transaction processes are on-cluster. And so you cannot manage a transactional data lake with these platforms from multiple disparate sources. Or am I mistaken?
Any anecdotes or performance limitations you’ve found would be appreciated!