I'm fairly new with Delta and lakehouse on databricks. I have some questions, based on the following actions:
- I import some parquet files
- Convert them to delta (creating 1 snappy.parquet file)
- Delete one random row (creating 1 new snappy.parquet file).
- I check content of both snappy files (version 0 of delta table, and version1), and they both contain all of the data, each one with it's specific differences.
Does this mean delta simply duplicates data for every new version?
How is this scalable? or am I missing something?