0

I am trying to insert a DataFrame in na existing Hive partitioned table.

I would like to parameterize by the partition columns but my current approach is not working:

var partitioncolumn="\"deletion_flag\",\"date_feed\""
df.repartition(37).write.
  mode(SaveMode.Overwrite).
  partitionBy(partitioncolumn).
  insertInto("db.table_name")

How can I make this work?

stefanobaghino
  • 11,253
  • 4
  • 35
  • 63

2 Answers2

2

As partitionBy is defined with variadic arguments:

def partitionBy(colNames: String*): DataFrameWriter[T] 

It should be:

var partitioncolumn= Seq("deletion_flag", "date_feed")
df.repartition(37).write.mode(SaveMode.Overwrite).partitionBy(
   partitioncolumn: _*
).insertInto("db.table_name")

where you provide expanded list of column names.

1

partitionBy takes a variable number of arguments (namely, Strings).

def partitionBy(colNames: String*): DataFrameWriter[T]
//                              ^ this stands for variadic arguments

In Scala, you can pass postfix a sequence with : _* to pass it as an argument list.

So you could do something like the following:

var partitioncolumn= Seq("deletion_flag", "date_feed")
df.repartition(37).write.
  mode(SaveMode.Overwrite).
  partitionBy(partitioncolumn: _*).
  insertInto("db.table_name")

Passing a sequence as variadic arguments is also described in this Q&A.

stefanobaghino
  • 11,253
  • 4
  • 35
  • 63