I'm using spark java to create serval xml files from large dataset.
So far I have a code like this :
dataframe
.repartition(partitions)
.write()
.mode(SaveMode.Overwrite)
.format("com.databricks.spark.xml")
.option("rootTag", "citations")
.option("rowTag", "citation")
.mode("overwrite")
.save("s3a://myfolder/output");
This code creates serval output files, and the number of that files is equal to partitions
.
The problem is that these files are named like this: part-0000.xml, part-0001.xml, etc
I want to rename this files, but I don't want to use .repartition(1)
, because I need output to be in multiple files.
I know there are similar questions on stackoverflow like this : How to rename spark data frame output file in AWS in spark SCALA but none of them solves my problem.
Any help would be highly appreciated.
Thanks