I need to read a CSV file in Pandas which has data in the following format (double 'double quotes' for one of the fields)
"column1","column2","column3","column4"
"10",""AB"","ABCD","abcd"
"11",""CD,E"","CDEF","abcd"
"12",""WER"","DEF,31","abcd"
I expect the correctly parsed dataframe to be like
column1 column2 column3 column4
10 AB ABCD abcd
11 "CD,E" CDEF abcd
12 WER "DEF,31" abcd
I tried using
df= pd.read_csv('sample.txt',quotechar='""', quoting=csv.QUOTE_ALL)
and
df= pd.read_csv('sample.txt',quotechar='"', quoting=csv.QUOTE_ALL)
but getting
TypeError: "quotechar" must be a 1-character string
and
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 3, saw 5
Is there a way for me to read this file as is without having to preprocess and remove the double 'double quotes' in the data?
When column2
has no commas, I'm able to read the data with some extra quotes which I can replace in further processing steps. I'm having parsing problems only when column2
is having a comma.