1

We have datasets partitioned on date, with a history going back some arbitrary amount of time.

If we need to apply updates to a particular day's data, we'd ideally want to replace just that day's data with new data, and leave data for all other days unchanged.

In Spark, this seems to be possible with partitionOverwriteMode (see Overwrite specific partitions in spark dataframe write method)

In the Foundry documentation of Snapshot vs. Incremental builds, there is no mention of updating datasets - it seems to only address appending to datasets via Incremental.

user5233494
  • 71
  • 1
  • 6
  • Assuming that your dataset is incremental, I am not aware of a solution without breaking the incremental computation. This seems to be a limitation of the Foundry Catalog. You could identify the parquet file + transaction which contains the row that you want to update and rewrite this file and perform a manual UPDATE transaction overwriting the single file. However, to my knowledge this will break your incremental... – nicornk Aug 31 '23 at 18:42

0 Answers0