2

I'm new to pyspark and looking for overwriting a delta partition dynamically. From the other resources available online I could see that spark supports dynamic partition by setting the below conf as "dynamic"

spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")

However, when I try overwriting the partitioned_table with a dataframe, the below line of code in pyspark (databricks) overwrites the entire table instead of a single partition on delta file.

data.write.insertInto("partitioned_table", overwrite = True)

I did come across the option of using Hive external table, but it is not straight forward in my case since the partitioned_table is based out of Delta file.

Please let me know what am I missing here. Thanks in advance!

Jaime Caffarel
  • 2,401
  • 4
  • 30
  • 42
Raji
  • 23
  • 1
  • 4

1 Answers1

4

Look at this issue and details regarding dynamic overwrite on delta tables : https://github.com/delta-io/delta/issues/348

You can use replaceWhere

srikanth holur
  • 760
  • 4
  • 11