I have a csv as follows:
customer_id,date_time,product
1,"2018-10-08 00:00:00",[]
2,"2018-03-26 00:00:00","["apple","orange"]"
As one can see column 4 is messy in the sense that if that field has no text in then the square bracket would not be wrapped in double quotes.
Anywho, my problem is that when importing with pandas:
df = pd.read_csv('df.csv', sep=',')
I am presented with the error message:
ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4
I am confident pandas is confused with the comma between "apple" and "orange" despite they reside in the same column. I have found: Python Pandas Error tokenizing data where the solution:
data = pd.read_csv('file1.csv', error_bad_lines=False)
is suggested. However, this is not viable in my case as this would affect too many rows. I am new to Python. In the past the following in R would have imported with no problem:
df <- read.csv(file.choose(), stringsAsFactors = FALSE)