0

I'm using spark java to create serval xml files from large dataset.

So far I have a code like this :

dataframe
                .repartition(partitions)
                .write()
                .mode(SaveMode.Overwrite)
                .format("com.databricks.spark.xml")
                .option("rootTag", "citations")
                .option("rowTag", "citation")
                .mode("overwrite")
                .save("s3a://myfolder/output");

This code creates serval output files, and the number of that files is equal to partitions.

The problem is that these files are named like this: part-0000.xml, part-0001.xml, etc

I want to rename this files, but I don't want to use .repartition(1), because I need output to be in multiple files.

I know there are similar questions on stackoverflow like this : How to rename spark data frame output file in AWS in spark SCALA but none of them solves my problem.

Any help would be highly appreciated.

Thanks

Nemanja
  • 3,295
  • 11
  • 15

0 Answers0