How to perform insert overwrite dynamically on partitions of Delta file using PySpark?

Question

I'm new to pyspark and looking for overwriting a delta partition dynamically. From the other resources available online I could see that spark supports dynamic partition by setting the below conf as "dynamic"

spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")

However, when I try overwriting the partitioned_table with a dataframe, the below line of code in pyspark (databricks) overwrites the entire table instead of a single partition on delta file.

data.write.insertInto("partitioned_table", overwrite = True)

I did come across the option of using Hive external table, but it is not straight forward in my case since the partitioned_table is based out of Delta file.

Please let me know what am I missing here. Thanks in advance!

score 4 · Accepted Answer · answered Jun 11 '20 at 03:09

4

Look at this issue and details regarding dynamic overwrite on delta tables : https://github.com/delta-io/delta/issues/348

You can use replaceWhere

answered Jun 11 '20 at 03:09

srikanth holur

760
4
11

Is this issue is resolved now? I am trying and it's not overwriting whole table.. – Ali Hasan Dec 12 '20 at 15:11
Not Yet. PR is open : https://github.com/delta-io/delta/pull/371 – srikanth holur Dec 13 '20 at 03:01

How to perform insert overwrite dynamically on partitions of Delta file using PySpark?

1 Answers1