Am trying to read data from csv file, split each row into respective columns.
But my regex is failing when a particular column has commas with in itself.
eg: a,b,c,"d,e, g,",f
I want result like:
a b c "d,e, g," f
which is 5 columns.
Here is the regex am using to split the string by comma
,(?=(?:"[^"]?(?:[^"])*))|,(?=[^"]+(?:,)|,+|$)
but it fails for few strings while it works for others.
All am looking for is, when I read data from csv using pyspark into dataframe/rdd, I want to load/preserve all the columns without any mistakes
Thank You