0

In Spark, I want to overwrite specific partitions rather than all of them. I am trying the following command:

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
df.write \
  .mode("overwrite") \
  .format("csv") \
  .partitionBy("partition_date", "hour") \
  .save("/user/test/test/output/")

This is working as expected in 2.4, but in Spark 2.2.0, it is overwriting all the partitions' data.

Is there any alternate option or configuration to do the same partitionOverwriteMode in spark 2.2.0

Rahul Patidar
  • 189
  • 1
  • 1
  • 14

1 Answers1

0

If you look for partitionOverwriteMode in the Spark Configuration docs page, you'll find that it has been introduced in version 2.3.0. Also, there is some description on this field:

When INSERT OVERWRITE a partitioned data source table, we currently support 2 modes: static and dynamic. In static mode, Spark deletes all the partitions that match the partition specification(e.g. PARTITION(a=1,b)) in the INSERT statement, before overwriting. In dynamic mode, Spark doesn't delete partitions ahead, and only overwrite those partitions that have data written into it at runtime. By default we use static mode to keep the same behavior of Spark prior to 2.3. Note that this config doesn't affect Hive serde tables, as they are always overwritten with dynamic mode. This can also be set as an output option for a data source using key partitionOverwriteMode (which takes precedence over this setting), e.g. dataframe.write.option("partitionOverwriteMode", "dynamic").save(path).

The bit of text in bold also seems to suggest that the behaviour prior to 2.3.0 was simply with spark.sql.sources.partitionOverwriteMode = static. So I expect that this is the behaviour that you will have in 2.2.0.

I did find a Stackoverflow post in which one of the answers says the following: Before Spark 2.3.0, the best solution would be to launch SQL statements to delete those partitions and then write them with mode append.

Hope this helps you a bit!

Koedlt
  • 4,286
  • 8
  • 15
  • 33