How can I save a spark DF as a CSV file?

Question

I have some Python code that loops through files and cretes a dataframe (DF). Also, I am converting the Python DF to a Spark DF. This works fine.

# convert python df to spark df and export the spark df
spark_df = spark.createDataFrame(DF)

Now, I am trying to save the Spark DF as a CSV file.

## Write Frame out as Table
spark_df.write.mode("overwrite").save("dbfs:/rawdata/AAA.csv")

The code directly above runs, but it doesn't create the CSV, or at least I can't find it where I would expect it to be. Is there a way to do this?

Ok, this is weird. When I run this: spark_df.write.csv(dbfs:/rawdata/AAA.csv"), it says the file already exists, but I literally can't see it anywhere! — ASH, Oct 13 '19 at 17:37
`dbfs ls -r /rawdata` print the output here otherwise we cant actually judge what is happening. — Ram Ghadiyaram, Oct 13 '19 at 18:23
@asher Using databricks notebook ? to look files , click "Data" icon on left panel of notebook, after that click "Add data" on top of that panel, then "DBFS" , see file you wrote out there.. the writes looks promising with the code you ran. — Karthick, Oct 13 '19 at 18:46
Oh, yes, that's a new trick for me. I haven't seen/tried that before. I did what you suggested, but I don't think that helps me get my data. I see 'AAA.csv', which is literally the name of my file, but I still don't see how I can download the results to my desktop. I know the cloud doesn't recognize my desktop, 'per se', but there must be a way to extract items from the lake. It shouldn't be this hard. Ugh. Thanks for the effort. — ASH, Oct 13 '19 at 21:42
@asher if you are able to see the file in `dbfs` you're question is how to download file from `dbfs` to `local machine`. please close this question and open up the new one. — Gaurang Shah, Oct 13 '19 at 22:43
@asher, see this post helps to download your file.. https://stackoverflow.com/questions/49019706/databricks-download-a-dbfs-filestore-file-to-my-local-machine — Karthick, Oct 19 '19 at 08:45

score 1 · Accepted Answer · answered Oct 14 '19 at 08:47

1

Spark takes path of output directory instead of output file while writing dataframe so the path that you have provided "dbfs:/rawdata/AAA.csv" will create directory AAA.csv not a file. You need to check for directory instead of file. In directory you will get multiple csv file based on your number of executors.

answered Oct 14 '19 at 08:47

Nikhil Suthar

2,289
1
6
24

"In directory you will get multiple csv file based on your number of executors." This is incorrect. – Karthick Oct 19 '19 at 08:41

How can I save a spark DF as a CSV file?

1 Answers1