-1

I have some Python code that loops through files and cretes a dataframe (DF). Also, I am converting the Python DF to a Spark DF. This works fine.

# convert python df to spark df and export the spark df
spark_df = spark.createDataFrame(DF)

Now, I am trying to save the Spark DF as a CSV file.

## Write Frame out as Table
spark_df.write.mode("overwrite").save("dbfs:/rawdata/AAA.csv")

The code directly above runs, but it doesn't create the CSV, or at least I can't find it where I would expect it to be. Is there a way to do this?

halfer
  • 19,824
  • 17
  • 99
  • 186
ASH
  • 20,759
  • 19
  • 87
  • 200
  • Ok, this is weird. When I run this: spark_df.write.csv(dbfs:/rawdata/AAA.csv"), it says the file already exists, but I literally can't see it anywhere! – ASH Oct 13 '19 at 17:37
  • `dbfs ls -r /rawdata` print the output here otherwise we cant actually judge what is happening. – Ram Ghadiyaram Oct 13 '19 at 18:23
  • @asher Using databricks notebook ? to look files , click "Data" icon on left panel of notebook, after that click "Add data" on top of that panel, then "DBFS" , see file you wrote out there.. the writes looks promising with the code you ran. – Karthick Oct 13 '19 at 18:46
  • Oh, yes, that's a new trick for me. I haven't seen/tried that before. I did what you suggested, but I don't think that helps me get my data. I see 'AAA.csv', which is literally the name of my file, but I still don't see how I can download the results to my desktop. I know the cloud doesn't recognize my desktop, 'per se', but there must be a way to extract items from the lake. It shouldn't be this hard. Ugh. Thanks for the effort. – ASH Oct 13 '19 at 21:42
  • @asher if you are able to see the file in `dbfs` you're question is how to download file from `dbfs` to `local machine`. please close this question and open up the new one. – Gaurang Shah Oct 13 '19 at 22:43
  • @asher, see this post helps to download your file.. https://stackoverflow.com/questions/49019706/databricks-download-a-dbfs-filestore-file-to-my-local-machine – Karthick Oct 19 '19 at 08:45

1 Answers1

1

Spark takes path of output directory instead of output file while writing dataframe so the path that you have provided "dbfs:/rawdata/AAA.csv" will create directory AAA.csv not a file. You need to check for directory instead of file. In directory you will get multiple csv file based on your number of executors.

Nikhil Suthar
  • 2,289
  • 1
  • 6
  • 24
  • "In directory you will get multiple csv file based on your number of executors." This is incorrect. – Karthick Oct 19 '19 at 08:41