1

Ok, I get this error...: "pandas.errors.ParserError: Error tokenizing data. C error: Expected 6 fields in line 12, saw 7"

...when trying to import a csv into a python script with pandas.read_csv():

path,Drawing_but_no_F5,Paralell_F5,Fixed,Needs_Attention,Errors
R:\13xx   Original Ranch Buildings\1301 Stonehouse\1301-015\F - Bid Documents and Contract Award,Yes,No,No,No,No
R:\13xx   Original Ranch Buildings\1302 Carriage House\1302-026A Carriage House, Redo North Side Landscape\F - Bid Document and Contract Award,Yes,No,No,No,No
R:\13xx   Original Ranch Buildings\1302 Carriage House\1302-028\F - Bid Documents and Contract Award,Yes,No,No,No,No
R:\13xx   Original Ranch Buildings\1302 Carriage House\1302-029\F - Bid Documents and Contract Award,Yes,No,No,No,No

Obviously, in the above entries, it is the third line that throws the error. Caveats include that I have to use that column as a path to process files there so changing the entry is not allowed. CSV is created elsewhere; I get it as-is. I do want to preserve the column header. This filepath column is used later as an index, so I would like to preserve that.

Many, many similar issues, but solutions seem very specific and I cannot get them to cooperate for my use case:

Pandas, read CSV ignoring extra commas Solutions seem to change entry values or rely on the cells being in the last column

Commas within CSV Data Solution involves sql tools methinks. I don't want to read the csv into sql tables... csv file is already delimited by commas so I don't think I changing the sep value will work.. (I cannot get it to work -- yet)

Problems reading CSV file with commas and characters in pandas Solution throws error: "for line in reader:_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)" Not too optimistic since op had the cell value in quotes whereas I do not.

1 Answers1

1

Here is a solution which is a minor modification of the accepted answer by @DSM in the last thread to which you linked (Problems reading CSV file with commas and characters in pandas).

import csv

with open('original.csv', 'r') as infile, open('fixed.csv', 'w') as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    for line in reader:
        newline = [','.join(line[:-5])] + line[-5:]
        writer.writerow(newline)

After running the above preprocessing code, you should be able to read fixed.csv using pd.read_csv().

This solution depends on knowing how many of the rightmost columns are always formatted correctly. In your example data, the rightmost five columns are always good, so we treat everything to the left of these columns as a single field, which csv.writer() wraps in double quotes.

Peter Leimbigler
  • 10,775
  • 1
  • 23
  • 37