0

I have a spark dataframe (hereafter spark_df) and I'd like to convert that to .csv format. I tried two following methods:

spark_df_cut.write.csv('/my_location/my_file.csv')
spark_df_cut.repartition(1).write.csv("/my_location/my_file.csv", sep=',')

where I get no error message for any of them and both get completed [it seems], but I cannot find any output .csv file in the target location! Any suggestion?

I'm on a cloud-based Jupyternotebook using spark '2.3.1'.

Rotail
  • 1,025
  • 4
  • 19
  • 40
  • 1
    Possible duplicate of [Saving dataframe to local file system results in empty results](https://stackoverflow.com/questions/51603404/saving-dataframe-to-local-file-system-results-in-empty-results) – user10938362 Jun 13 '19 at 17:32
  • seems both questions are different. – Suresh Jun 14 '19 at 00:13

2 Answers2

0
spark_df_cut.write.csv('/my_location/my_file.csv') 
//will create directory named my_file.csv in your specified path and writes data in CSV format into part-* files. 

We are not able to control the names of files while writing the dataframe, look for directory named my_file.csv in your location (/my_location/my_file.csv).

In case if you want filename ending with *.csv then you need to rename using fs.rename method.

notNull
  • 30,258
  • 4
  • 35
  • 50
0

spark_df_cut.write.csv save the files as part files. there is no direct solution available in spark to save as .csv file that can be opened directly with xls or some other. but there are multiple workarounds available one such work around is to convert spark Dataframe to panda Dataframe and use to_csv method like below

df  = spark.read.csv(path='game.csv', sep=',')
pdf = df.toPandas()
pdf.to_csv(path_or_buf='<path>/real.csv')

this will save the data as .csv file

and another approach is using open the file using hdfs command and cat that to a file. please post if you need more help

Suresh
  • 38,717
  • 16
  • 62
  • 66