I've read a lot about snowflake caching, but do not understand how snowflake gets around data freshness problem encountered by other "shared everything architectures". My understanding is that, with shared everything when the data changes, there essentially needs to be some locking.
As explained in this article:
For efficiency both nodes have cached local copies of record 1 in memory. A client then tries to update record 1 so that ‘foo’ becomes ‘bar’. To do this in a consistent manner the DBMS must take a distributed lock on all nodes that may have cached record 1. Such distributed locks become slower and slower as you increase the number of machines in the cluster and as a result can impede the scalability of the writing process.
In other words, if snowflake is caching data in the compute layer, but the underlying data changes in S3, doesn't snowflake run into the same caching issues of other shared everything architectures? I get that in an analytics database, there are fewer updates, but then why wouldn't traditional shared-everything architectures work?
The only answer I've seen is that snowflake has a query optimizer that checks to see if the underlying data has changed. But still don't understand this/how this is a significant upgrade over caching strategy by traditional shared-everything DBs.