Say I have a Spark DataFrame which I want to save as CSV file. After Spark 2.0.0 , DataFrameWriter class directly supports saving it as a CSV file.
The default behavior is to save the output in multiple part-*.csv files inside the path provided.
How would I save a DF with :
- Path mapping to the exact file name instead of folder
- Header available in first line
- Save as a single file instead of multiple files.
One way to deal with it, is to coalesce the DF and then save the file.
df.coalesce(1).write.option("header", "true").csv("sample_file.csv")
However this has disadvantage in collecting it on Master machine and needs to have a master with enough memory.
Is it possible to write a single CSV file without using coalesce ? If not, is there a efficient way than the above code ?