How to export DataFrame to csv in Scala?

Question

How can I export Spark's DataFrame to csv file using Scala?

karthik manchala · Accepted Answer · 2020-01-06T07:09:52.630

Easiest and best way to do this is to use spark-csv library. You can check the documentation in the provided link and here is the scala example of how to load and save data from/to DataFrame.

Code (Spark 1.4+):

dataFrame.write.format("com.databricks.spark.csv").save("myFile.csv")

Edit:

Spark creates part-files while saving the csv data, if you want to merge the part-files into a single csv, refer the following:

Merge Spark's CSV output folder to Single File

Taylrl · Answer 2 · 2019-03-06T10:54:47.933

15

In Spark verions 2+ you can simply use the following;

df.write.csv("/your/location/data.csv")

If you want to make sure that the files are no longer partitioned then add a .coalesce(1) as follows;

df.coalesce(1).write.csv("/your/location/data.csv")

edited Mar 06 '19 at 10:54

answered Jul 13 '18 at 11:50

Taylrl

3,601
6
33
44

1

Can we rename the part_0000 file? – Shringa Bais Aug 01 '18 at 20:01
You can easily rename it after it's written out if you wish by using `cp ` (or `hdfs dfs -cp ` if the file is still in hdfs) to copy the file to its current location but with the new name – Taylrl Oct 03 '18 at 10:27
Please note this doesn't export with headers – Prashant Shubham Oct 12 '21 at 12:01

Abu Shoeb · Answer 3 · 2019-03-11T23:35:34.457

13

Above solution exports csv as multiple partitions. I found another solution by zero323 on this stackoverflow page that exports a dataframe into one single CSV file when you use coalesce.

df.coalesce(1)
  .write.format("com.databricks.spark.csv")
  .option("header", "true")
  .save("/your/location/mydata")

This would create a directory named mydata where you'll find a csv file that contains the results.

edited Mar 11 '19 at 23:35

answered Oct 29 '17 at 19:01

Abu Shoeb

4,747
2
40
45

score 0 · Answer 4 · answered Oct 28 '21 at 11:12

A method to export and rename the file:

def export_csv(  
  fileName: String,
  filePath: String
  ) = {

  val filePathDestTemp = filePath + ".dir/"
  val merstageout_df = spark.sql(merstageout)

  merstageout_df
    .coalesce(1)
    .write
    .option("header", "true")
    .mode("overwrite")
    .csv(filePathDestTemp)
  
  val listFiles = dbutils.fs.ls(filePathDestTemp)

  for(subFiles <- listFiles){
      val subFiles_name: String = subFiles.name
      if (subFiles_name.slice(subFiles_name.length() - 4,subFiles_name.length()) == ".csv") {
        dbutils.fs.cp (filePathDestTemp + subFiles_name,  filePath + fileName+ ".csv")
        dbutils.fs.rm(filePathDestTemp, recurse=true)
      }}}

How to export DataFrame to csv in Scala?

4 Answers4