Pyspark is not reading csv properly around quote

Question

This line "{""subject"":""Title"",""Headline"":""test head""}",2021-01-01 should have two column while doing

spark.read.option("header", "true").option("inferSchema", "true").option("multiLine","true").csv('test.csv') gives me three columns.

I think the issue is that pyspark does not escape the " when it encounters two quotes in a row.

Does this answer your question? [Reading csv files with quoted fields containing embedded commas](https://stackoverflow.com/questions/40413526/reading-csv-files-with-quoted-fields-containing-embedded-commas) — qaziqarta, Aug 18 '22 at 18:02
Have a look at https://stackoverflow.com/a/45138591/19032206. Adding `.option("escape", "\"")` will get you covered. — qaziqarta, Aug 18 '22 at 18:03

score 0 · Answer 1 · answered Aug 19 '22 at 19:04

0

Thanks to @qaziqarta

Adding .option("escape", "\"")

answered Aug 19 '22 at 19:04

flexwang

1 Answers1