1

I am trying to save a dataframe as .csv in spark. It is required to have all fields bounded by "Quotes". Currently, the file is not enclosed by "Quotes".

I am using Spark 2.1.0

Code :

DataOutputResult.write.format("com.databricks.spark.csv").
option("header", true).
option("inferSchema", false).
option("quoteMode", "ALL").
mode("overwrite").
save(Dataoutputfolder)

Output format(actual) :

Name, Id,Age,Gender

XXX,1,23,Male

Output format (Required) :

"Name", "Id" ," Age" ,"Gender"

"XXX","1","23","Male"

Options I tried so far :

QuoteMode, Quote in the options during it as file, But with no success.

koiralo
  • 22,594
  • 6
  • 51
  • 72
rakesh jayaram
  • 63
  • 1
  • 10

2 Answers2

0

("quote", "all"), replace quoteMode with quote

or play with concat or concat_wsdirectly on df columns and save without quote - mode

import org.apache.spark.sql.functions.{concat, lit}

val newDF = df.select(concat($"Name", lit("""), $"Age"))

or create own udf function to add desired behaviour, pls find more examples in Concatenate columns in apache spark dataframe

Community
  • 1
  • 1
elcomendante
  • 1,113
  • 1
  • 11
  • 28
0

Unable to add as a comment to the above answer, so posting as an answer. In Spark 2.3.1, use quoteAll

df1.write.format("csv")
.option("header", true)
.option("quoteAll","true")
.save(Dataoutputfolder)

Also, to add to the comment of @Karol Sudol (great answer btw), .option("quote","\u0000") will work only if one is using Pyspark with Python 3 which has default encoding as 'utf-8'. A few reported that the option did not work, because they must be using Pyspark with Python 2 whose default encoding is 'ascii'. Therefore the error "java.lang.RuntimeException: quote cannot be more than one character"

user8414391
  • 152
  • 1
  • 8