Renaming all otput files in Spark

Question

I'm using spark java to create serval xml files from large dataset.

So far I have a code like this :

dataframe
                .repartition(partitions)
                .write()
                .mode(SaveMode.Overwrite)
                .format("com.databricks.spark.xml")
                .option("rootTag", "citations")
                .option("rowTag", "citation")
                .mode("overwrite")
                .save("s3a://myfolder/output");

This code creates serval output files, and the number of that files is equal to partitions.

The problem is that these files are named like this: part-0000.xml, part-0001.xml, etc

I want to rename this files, but I don't want to use .repartition(1), because I need output to be in multiple files.

I know there are similar questions on stackoverflow like this : How to rename spark data frame output file in AWS in spark SCALA but none of them solves my problem.

Any help would be highly appreciated.

Thanks

Renaming all otput files in Spark

0 Answers0