Apache Hudi Partitioning with custom format

Question

I am currently doing a POC on Apache Hudi with spark(scala).

I am facing a problem while saving a dataframe with partitioning.

Hudi saves the dataframe with path/valueOfPartitionCol1/valueOfPartitionCol2.... using the property PARTITIONPATH_FIELD_OPT_KEY.

But my requirement is path/COL1=value/COL2=value.... Similar to the way spark partitions the data using partitionBy().

Anyone who has tried out custom partitioning with Hudi can help me out?

score 2 · Answer 1 · answered Apr 03 '20 at 08:34

2

can this help? set config HIVE_STYLE_PARTITIONING_OPT_KEY=true as below:

  batchDF.write.format("org.apache.hudi")

          .option(HIVE_STYLE_PARTITIONING_OPT_KEY, true)

          .mode(SaveMode.Append)
          .save(bathPath)

answered Apr 03 '20 at 08:34

Machi

403
2
14

score 1 · Answer 2 · answered May 07 '20 at 14:53

You can create custom implementation of KeyGenerator class, Implement override def getKey(record: GenericRecord): HoodieKey class. In this method you get a instance of GenericRecord and return a class of HoodieKey() which allows you to define your custom logic for generating path partition

Apache Hudi Partitioning with custom format

2 Answers2