I have a .csv file with rows containing string values:
"315700H9VGE9BHU9DK42","""LEGOS s.r.o."", švédsky ""LEGOS bolag med begr.amsvar""","cs","",""
My second field occasionally contains 'strings with enclosed "quote" values':
"""LEGOS s.r.o."", švédsky ""LEGOS bolag med begr.amsvar"""
That when read into a spark dataframe the value would present:
"LEGOS s.r.o.", švédsky "LEGOS bolag med begr.amsvar"
I have tried this, and variations of the commented out options as described in these docs
df = (spark
.read
.format("csv")
.option("header", True)
.option("delimiter", ",")
.option("multiline", True)
.option("escapeQuotes", True)
.option("quote", "\"")
.option("escape", "\"")
# .option("escape", "\\")
# .option("escape", '""')
# .option("escape", "\n")
.schema(raw_schema)
.csv(landing_schema_file)
)
Any ideas?
I'm running on Apache Spark 3.3.0, Scala 2.12