Read a csv into pandas that has commas within first/index cells of the csv rows without changing value

Question

Ok, I get this error...: "pandas.errors.ParserError: Error tokenizing data. C error: Expected 6 fields in line 12, saw 7"

...when trying to import a csv into a python script with pandas.read_csv():

path,Drawing_but_no_F5,Paralell_F5,Fixed,Needs_Attention,Errors
R:\13xx   Original Ranch Buildings\1301 Stonehouse\1301-015\F - Bid Documents and Contract Award,Yes,No,No,No,No
R:\13xx   Original Ranch Buildings\1302 Carriage House\1302-026A Carriage House, Redo North Side Landscape\F - Bid Document and Contract Award,Yes,No,No,No,No
R:\13xx   Original Ranch Buildings\1302 Carriage House\1302-028\F - Bid Documents and Contract Award,Yes,No,No,No,No
R:\13xx   Original Ranch Buildings\1302 Carriage House\1302-029\F - Bid Documents and Contract Award,Yes,No,No,No,No

Obviously, in the above entries, it is the third line that throws the error. Caveats include that I have to use that column as a path to process files there so changing the entry is not allowed. CSV is created elsewhere; I get it as-is. I do want to preserve the column header. This filepath column is used later as an index, so I would like to preserve that.

Many, many similar issues, but solutions seem very specific and I cannot get them to cooperate for my use case:

Pandas, read CSV ignoring extra commas Solutions seem to change entry values or rely on the cells being in the last column

Commas within CSV Data Solution involves sql tools methinks. I don't want to read the csv into sql tables... csv file is already delimited by commas so I don't think I changing the sep value will work.. (I cannot get it to work -- yet)

Problems reading CSV file with commas and characters in pandas Solution throws error: "for line in reader:_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)" Not too optimistic since op had the cell value in quotes whereas I do not.

Turns out there is a parameter in pandas csv methods that resolves this: pandas.to_csv('filename.csv', quoting = 'QUOTE_NONNUMERIC') — constdoc constdoc, Aug 08 '19 at 21:16

Peter Leimbigler · Answer 1 · 2019-08-07T20:52:49.283

1

Here is a solution which is a minor modification of the accepted answer by @DSM in the last thread to which you linked (Problems reading CSV file with commas and characters in pandas).

import csv

with open('original.csv', 'r') as infile, open('fixed.csv', 'w') as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    for line in reader:
        newline = [','.join(line[:-5])] + line[-5:]
        writer.writerow(newline)

After running the above preprocessing code, you should be able to read fixed.csv using pd.read_csv().

This solution depends on knowing how many of the rightmost columns are always formatted correctly. In your example data, the rightmost five columns are always good, so we treat everything to the left of these columns as a single field, which csv.writer() wraps in double quotes.

edited Aug 07 '19 at 20:52

answered Aug 07 '19 at 00:40

Peter Leimbigler

10,775
1
23
37

I can modify this to work. The for loop is indented too far. And the path column of rows with the 'extra' cell/comma are in quotes, whereas the others are not in quotes. I will probably want for them all to be in quotes if this is to be the solution. Thanks. – constdoc constdoc Aug 07 '19 at 17:07
Whoops, edited the indentation. Glad this is helpful! – Peter Leimbigler Aug 07 '19 at 20:52
Turns out there is a parameter in pandas.read_csv() that does this – constdoc constdoc Aug 08 '19 at 21:16

Read a csv into pandas that has commas *within* first/index cells of the csv rows without changing value

1 Answers1

Read a csv into pandas that has commas within first/index cells of the csv rows without changing value