3

I am currently doing a POC on Apache Hudi with spark(scala).

I am facing a problem while saving a dataframe with partitioning.

Hudi saves the dataframe with path/valueOfPartitionCol1/valueOfPartitionCol2.... using the property PARTITIONPATH_FIELD_OPT_KEY.

But my requirement is path/COL1=value/COL2=value.... Similar to the way spark partitions the data using partitionBy().

Anyone who has tried out custom partitioning with Hudi can help me out?

byte_array
  • 2,767
  • 1
  • 16
  • 10

2 Answers2

2

can this help? set config HIVE_STYLE_PARTITIONING_OPT_KEY=true as below:

  batchDF.write.format("org.apache.hudi")

          .option(HIVE_STYLE_PARTITIONING_OPT_KEY, true)

          .mode(SaveMode.Append)
          .save(bathPath)
Machi
  • 403
  • 2
  • 14
1

You can create custom implementation of KeyGenerator class, Implement override def getKey(record: GenericRecord): HoodieKey class. In this method you get a instance of GenericRecord and return a class of HoodieKey() which allows you to define your custom logic for generating path partition

Sunil Patil
  • 619
  • 4
  • 8