0

This line "{""subject"":""Title"",""Headline"":""test head""}",2021-01-01 should have two column while doing

spark.read.option("header", "true").option("inferSchema", "true").option("multiLine","true").csv('test.csv') gives me three columns.

I think the issue is that pyspark does not escape the " when it encounters two quotes in a row.

flexwang
  • 625
  • 6
  • 16
  • Does this answer your question? [Reading csv files with quoted fields containing embedded commas](https://stackoverflow.com/questions/40413526/reading-csv-files-with-quoted-fields-containing-embedded-commas) – qaziqarta Aug 18 '22 at 18:02
  • Have a look at https://stackoverflow.com/a/45138591/19032206. Adding `.option("escape", "\"")` will get you covered. – qaziqarta Aug 18 '22 at 18:03

1 Answers1

0

Thanks to @qaziqarta

Adding .option("escape", "\"")

flexwang
  • 625
  • 6
  • 16