Ignore backslash-quote inside the csv field PySpark

Asked Oct 29 '21 at 11:36

Active Nov 01 '21 at 09:10

Viewed 249 times

I'm trying to read CSV using spark in databricks:

spark.read.format('csv').option('header', 'true')\
  .option('inferschema', True)\
  .option('quote', '\"')\
  .option("escape", '\"')\
  .load(path_to_csv)\
  .createOrReplaceTempView('table_name')

But it doesn't read correcrtly following line:

""Sample Company",LLC"

Instead of getting:

+------------------------+
|                   col1 |
+------------------------+
|    "Sample Company",LLC|
+------------------------+

I get following result:

+------------------+--------------------+
|             col1 |                col2|
+------------------+--------------------+
|""Sample Company" | LLC"               |
+------------------+--------------------+

Tried different combinations of "quotes" and "escape" options, but nothing works so far.

edited Nov 01 '21 at 09:10

asked Oct 29 '21 at 11:36

Max_Process

Ref: https://stackoverflow.com/questions/52704937/how-to-read-a-csv-file-with-commas-within-a-field-using-pyspark – Karthikeyan Rasipalay Durairaj Oct 29 '21 at 15:30

Ignore backslash-quote inside the csv field PySpark

0 Answers0