How to set filename while writing in Pyspark?

Question

I'm saving a dataframe to csv with the following code:

df.write\
    .option("header",True) \
    .mode("overwrite") \
    .option("sep","|")\
    .format("csv") \
    .save("filepath")

This saves the file as part-xxx-xx.csv

I want to save the file as Tablename.csv. How to achieve this?

Israel Phiri · Answer 1 · 2022-02-01T09:31:02.157

0

You don't have option to give filename when writing files in spark because of partitioning but you can use Hadoop Filesystem API to rename your partition.

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}

fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

partCSV=new Path("/your-path-here/part-xxx-xx.csv")
tablenameCSV= new Path("/your-path-here/Tablename.csv")

//Rename a File
fs.rename(partCSV,tablenameCSV)

see: https://sparkbyexamples.com/spark/spark-rename-and-delete-file-directory-from-hdfs/

edited Feb 01 '22 at 09:31

answered Feb 01 '22 at 07:25

Israel Phiri

109
1
11

1

What your solution does is it creates a Tablename folder and inside that folder the csv files are saved as part-xxx-xx.csv. I need the csv file itself to be named as Tablename.csv – krx Feb 01 '22 at 08:17

How to set filename while writing in Pyspark?

1 Answers1