Going through the Hudi documentation I saw the Metadata Config section and was curious about how it is used. I created a table enabling the metadata and the directory got created under /.hoodie/metadata
. Has anybody experimented with this feature? Is the metadata exposed or only used internally to Hudi? What is it used for? I couldn't understand it from the docs.
I used the following Hudi options to create a table in S3 using PySpark.
hudi_options_insert = {
"hoodie.table.name": "table_p5",
"hoodie.datasource.write.table.type": "COPY_ON_WRITE",
"hoodie.datasource.write.recordkey.field": "id",
"hoodie.datasource.write.operation": "bulk_insert",
"hoodie.datasource.write.partitionpath.field": "ds",
"hoodie.datasource.write.precombine.field": "id",
"hoodie.datasource.write.hive_style_partitioning": "true",
"hoodie.datasource.hive_sync.table": "table_p5",
"hoodie.datasource.hive_sync.database": "poc_hudi",
"hoodie.datasource.hive_sync.enable": "true",
"hoodie.datasource.hive_sync.partition_fields": "ds",
"hoodie.insert.shuffle.parallelism": 6,
"hoodie.metadata.enable": "true",
"hoodie.metadata.insert.parallelism": 6
}
Thanks a mil.