1

how to save a spark dataframe into one partition of a partitioned hive table?

raw_nginx_log_df.write.saveAsTable("raw_nginx_log")

the above way could overwrite the whole table but not a specific partition. although i can solve the problem by the following code , it is obviously not elegant.

raw_nginx_log_df.registerTempTable("tmp_table")
sql(s"INSERT OVERWRITE TABLE raw_nginx_log PARTITION (par= '$PARTITION_VAR')")

it seems that in stackoverflowc.com there is no similar questions asked ever before!

Cœur
  • 37,241
  • 25
  • 195
  • 267
shengshan zhang
  • 538
  • 8
  • 16
  • 2
    `raw_nginx_log_df.write.partitionBy("partition_col").mode("overwrite").saveAsTable("raw_nginx_log")` – mrsrinivas Feb 10 '17 at 07:31
  • what if raw_nginx_log_df2 need to by saved into another partition of the table ? – shengshan zhang Feb 10 '17 at 07:33
  • 1
    *"not elegant"* is a matter of personal taste *(for instance, Scala makes me puke)* -- the question is, does it work? Does it make any difference performance-wise? If you are not satisfied, can you contribute a patch to the Spark code base? – Samson Scharfrichter Feb 10 '17 at 08:04
  • here is the solution. https://stackoverflow.com/questions/38487667/overwrite-specific-partitions-in-spark-dataframe-write-method – Shuguang Yang Jul 12 '17 at 10:28

1 Answers1

2
YourDataFrame.write.format("parquet").option("/pathHiveLocation").mode(SaveMode.Append).partitionBy("partitionCol").saveAsTable("YourTable")

For parquet files/tables. You may customize it as per your requirement.

Sat
  • 73
  • 8