I have a Pyspark dataframe that has commas in one of the field. Sample data:
+--------+------------------------------------------------------------------------------------+
|id |reason |
+--------+------------------------------------------------------------------------------------+
|123-8aab|Request for the added "Hello Ruth, How are, you, doing and Other" abc. Thanks! |
|125-5afs|Hi Prachi, I added an "XYZA", like josa.coam, row. "Uid to be eligible" for clarity.|
+--------+------------------------------------------------------------------------------------+
When I am writing this in csv, the data is spilling on to the next column and is not represented correctly. Code I am using to write data and output:
df_csv.repartition(1).write.format('csv').option("header", "true").save(
"s3://{}/report-csv".format(bucket_name), mode='overwrite')
How data appears in csv:
Any help would really be appreciated. TIA.
NOTE : I think if the field has just commas, its exporting properly, but the combination of quotes and commas is what is causing the issue.