How can I export Spark's DataFrame to csv file using Scala?
4 Answers
Easiest and best way to do this is to use spark-csv
library. You can check the documentation in the provided link and here
is the scala example of how to load and save data from/to DataFrame.
Code (Spark 1.4+):
dataFrame.write.format("com.databricks.spark.csv").save("myFile.csv")
Edit:
Spark creates part-files while saving the csv data, if you want to merge the part-files into a single csv, refer the following:

- 13,492
- 1
- 31
- 55
In Spark verions 2+ you can simply use the following;
df.write.csv("/your/location/data.csv")
If you want to make sure that the files are no longer partitioned then add a .coalesce(1)
as follows;
df.coalesce(1).write.csv("/your/location/data.csv")

- 3,601
- 6
- 33
- 44
-
1Can we rename the part_0000 file? – Shringa Bais Aug 01 '18 at 20:01
-
You can easily rename it after it's written out if you wish by using `cp
` (or `hdfs dfs -cp ` if the file is still in hdfs) to copy the file to its current location but with the new name -
Please note this doesn't export with headers – Prashant Shubham Oct 12 '21 at 12:01
Above solution exports csv as multiple partitions. I found another solution by zero323 on this stackoverflow page that exports a dataframe into one single CSV file when you use coalesce
.
df.coalesce(1)
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("/your/location/mydata")
This would create a directory named mydata
where you'll find a csv
file that contains the results.

- 4,747
- 2
- 40
- 45
A method to export and rename the file:
def export_csv(
fileName: String,
filePath: String
) = {
val filePathDestTemp = filePath + ".dir/"
val merstageout_df = spark.sql(merstageout)
merstageout_df
.coalesce(1)
.write
.option("header", "true")
.mode("overwrite")
.csv(filePathDestTemp)
val listFiles = dbutils.fs.ls(filePathDestTemp)
for(subFiles <- listFiles){
val subFiles_name: String = subFiles.name
if (subFiles_name.slice(subFiles_name.length() - 4,subFiles_name.length()) == ".csv") {
dbutils.fs.cp (filePathDestTemp + subFiles_name, filePath + fileName+ ".csv")
dbutils.fs.rm(filePathDestTemp, recurse=true)
}}}

- 2,143
- 1
- 11
- 30