Unable to append "Quotes" in write for dataframe

Question

I am trying to save a dataframe as .csv in spark. It is required to have all fields bounded by "Quotes". Currently, the file is not enclosed by "Quotes".

I am using Spark 2.1.0

Code :

DataOutputResult.write.format("com.databricks.spark.csv").
option("header", true).
option("inferSchema", false).
option("quoteMode", "ALL").
mode("overwrite").
save(Dataoutputfolder)

Output format(actual) :

Name, Id,Age,Gender

XXX,1,23,Male

Output format (Required) :

"Name", "Id" ," Age" ,"Gender"

"XXX","1","23","Male"

Options I tried so far :

QuoteMode, Quote in the options during it as file, But with no success.

Resolved : option("quoteAll", true). – rakesh jayaram Apr 15 '17 at 06:32 — rakesh jayaram, Apr 15 '17 at 06:32

score 0 · Accepted Answer · edited May 23 '17 at 12:32

0

("quote", "all"), replace quoteMode with quote

or play with concat or concat_wsdirectly on df columns and save without quote - mode

import org.apache.spark.sql.functions.{concat, lit}

val newDF = df.select(concat($"Name", lit("""), $"Age"))

or create own udf function to add desired behaviour, pls find more examples in Concatenate columns in apache spark dataframe

edited May 23 '17 at 12:32

Community

1
1

answered Apr 14 '17 at 06:38

elcomendante

1,113
1
11
28

java.lang.RuntimeException: quote cannot be more than one character – rakesh jayaram Apr 14 '17 at 07:03
1

try `.option("quote", "\u0000")` might be to do with `unicode nulls` – elcomendante Apr 14 '17 at 07:12
I still see the issue. Added the option("quote","\u0000") – rakesh jayaram Apr 14 '17 at 07:16
I got an error saying that "java.lang.RuntimeException: quote cannot be more than one character" on \u0000 do you know what it means? ".option("header","true").option("quoteAll","true").option("quote","\u0000").mode("overwrite").save("data/file.csv")" – khusnanadia Sep 05 '18 at 19:07

score 0 · Answer 2 · answered Dec 07 '18 at 18:40

Unable to add as a comment to the above answer, so posting as an answer. In Spark 2.3.1, use quoteAll

df1.write.format("csv")
.option("header", true)
.option("quoteAll","true")
.save(Dataoutputfolder)

Also, to add to the comment of @Karol Sudol (great answer btw), .option("quote","\u0000") will work only if one is using Pyspark with Python 3 which has default encoding as 'utf-8'. A few reported that the option did not work, because they must be using Pyspark with Python 2 whose default encoding is 'ascii'. Therefore the error "java.lang.RuntimeException: quote cannot be more than one character"

Unable to append "Quotes" in write for dataframe

2 Answers2