Converting spark dataframe to flatfile .csv

Question

I have a spark dataframe (hereafter spark_df) and I'd like to convert that to .csv format. I tried two following methods:

spark_df_cut.write.csv('/my_location/my_file.csv')
spark_df_cut.repartition(1).write.csv("/my_location/my_file.csv", sep=',')

where I get no error message for any of them and both get completed [it seems], but I cannot find any output .csv file in the target location! Any suggestion?

I'm on a cloud-based Jupyternotebook using spark '2.3.1'.

Possible duplicate of [Saving dataframe to local file system results in empty results](https://stackoverflow.com/questions/51603404/saving-dataframe-to-local-file-system-results-in-empty-results) — user10938362, Jun 13 '19 at 17:32

notNull · Answer 1 · 2019-06-13T19:04:49.837

spark_df_cut.write.csv('/my_location/my_file.csv') 
//will create directory named my_file.csv in your specified path and writes data in CSV format into part-* files.

We are not able to control the names of files while writing the dataframe, look for directory named my_file.csv in your location (/my_location/my_file.csv).

In case if you want filename ending with *.csv then you need to rename using fs.rename method.

score 0 · Answer 2 · answered Jun 14 '19 at 00:12

spark_df_cut.write.csv save the files as part files. there is no direct solution available in spark to save as .csv file that can be opened directly with xls or some other. but there are multiple workarounds available one such work around is to convert spark Dataframe to panda Dataframe and use to_csv method like below

df  = spark.read.csv(path='game.csv', sep=',')
pdf = df.toPandas()
pdf.to_csv(path_or_buf='<path>/real.csv')

this will save the data as .csv file

and another approach is using open the file using hdfs command and cat that to a file. please post if you need more help

Converting spark dataframe to flatfile .csv

2 Answers2