Trying to Save Spark Dataframe to disk as CSV on DBFS

Question

I have found multiple results on how to save a Dataframe as CSV to disk on Databricks platforme.g. Spark Dataframe save as CSV How to save a spark DataFrame as csv on disk?

However, whenever I try to apply the answers to my situation it failed. Therefore, I am submitting my own question on the issue here.

I generate the following Dataframe with the following code:

df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

I would now like to save the above dataframe to disk.

I have tried the following:

filepath = "/FileStore/tables"
df.coalesce(1).write.option("header","true").option("sep",",").mode("overwrite").csv("filepath")

But I get the following error:

Can someone let me know where I'm going wrong?

I managed to figure out why I was getting the above error - its because I was trying to write to a Community Edition of Databricks. Everything worked fine when I applied to code to a paid for Databricks platform. However, the file is being saved as ```part-00000-tid-3693777652656899971-46f65adb-4641-446f-863f-eade3e2b3155-2-1-c000.csv``` . Can someone let me know how to rename the file to something more meaningful? — Carltonp, Oct 24 '19 at 13:13

score 0 · Accepted Answer · answered Nov 01 '19 at 08:35

Sharing the answer as per the comment by the original poster:

"I managed to figure out why I was getting the above error - its because I was trying to write to a Community Edition of Databricks. Everything worked fine when I applied to code to a paid for Databricks platform".

Answering question on comment:

Can someone let me know how to rename the file to something more meaningful?

It's not possible to do it directly to change the file name in Spark's save.

Spark uses Hadoop File Format, which requires data to be partitioned - that's why you have part- files. You can easily change filename after processing just like in the SO thread.

You may refer similar SO thread, which addressed similar issue.

Hope this helps.

Trying to Save Spark Dataframe to disk as CSV on DBFS

1 Answers1