1

I am quite new to PySpark, I am trying to read and then save a CSV file using Azure Databricks.

After saving the file I see many other files like "_Committed","_Started","_Success" and finally the CSV file with a totally different name.

I have already checked using DataFrame repartition(1) and coalesce(1) but this only deals when the CSV file itself was partitioned by Spark. Is there anything that can be done using PySpark?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Prakazz
  • 421
  • 1
  • 8
  • 21
  • Does this answer your question? [How do you write a CSV back to Azure Blob Storage using Databricks?](https://stackoverflow.com/questions/63851044/how-do-you-write-a-csv-back-to-azure-blob-storage-using-databricks) – Axel R. Jul 01 '21 at 15:51

2 Answers2

0

You can do the following:

df.toPandas().to_csv(path/to/file.csv)

It will create a single file csv as you expect.

Axel R.
  • 1,141
  • 7
  • 22
-1

Those are default Log files created when saving from PySpark . We can't eliminate this. Using coalesce(1) you can save in a single file without partition.

Robinhood
  • 92
  • 2
  • 10