5

I checked out this answer as I am having a similar problem.

Python Pandas Error tokenizing data

However, for some reason ALL of my rows are being skipped.

My code is simple:

import pandas as pd

fname = "data.csv"
input_data = pd.read_csv(fname) 

and the error I get is:

  File "preprocessing.py", line 8, in <module>
    input_data = pd.read_csv(fname) #raw data file ---> pandas.core.frame.DataFrame type
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 465, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 251, in _read
    return parser.read()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 710, in read
    ret = self._engine.read(nrows)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 1154, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 754, in pandas.parser.TextReader.read (pandas/parser.c:7391)
  File "pandas/parser.pyx", line 776, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7631)
  File "pandas/parser.pyx", line 829, in pandas.parser.TextReader._read_rows (pandas/parser.c:8253)
  File "pandas/parser.pyx", line 816, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8127)
  File "pandas/parser.pyx", line 1728, in pandas.parser.raise_parser_error (pandas/parser.c:20357)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 11 fields in line 5, saw 13
Community
  • 1
  • 1
user1452494
  • 1,145
  • 5
  • 18
  • 40
  • 4
    So somehow we're supposed to reverse-engineer from the error your data that produced it? Please post sample raw input data – EdChum Apr 20 '15 at 17:46
  • It looks like your CSV doesn't have the same number of fields on every line. Try opening it in Excel or your favorite spreadsheet program to verify its structure. – MattDMo Apr 20 '15 at 17:50
  • This description got me here and this was the same problem I had. +1 for that. – calmrat Aug 02 '15 at 18:15
  • Dynamically generate column names for variable number of columns for read_csv(): https://stackoverflow.com/a/52890095/1427624 – P-S Oct 19 '18 at 10:01

8 Answers8

10

Solution is to use pandas built-in delimiter "sniffing".

input_data = pd.read_csv(fname, sep=None) 
user1452494
  • 1,145
  • 5
  • 18
  • 40
5

For those landing here, I got this error when the file was actually an .xls file not a true .csv. Try resaving as a csv in a spreadsheet app.

Kate Stohr
  • 99
  • 2
  • 5
  • Wow. Thank you. Nothing was working and I spent like 2 hours googling how to figure this out. I tried everything! Turns out, the "csv" sent to me was actually a "txt" file, not a true csv. I have no idea how that even happened, since it ends in ".csv" but thank you! – ArthurH Sep 12 '19 at 18:00
2

I had the same error, I read my csv data using this : d1 = pd.read_json('my.csv') then I try this d1 = pd.read_json('my.csv', sep='\t') and this time it's right. So you could try this method if your delimiter is not ',', because the default is ',', so if you don't indicate clearly, it go wrong. pandas.read_csv

ShenDu
  • 21
  • 2
1

This error means, you get unequal number of columns for each row. In your case, until row 5, you've had 11 columns but in line 5 you have 13 inputs (columns).

For this problem, you can try the following approach to open read your file:

import csv
with open('filename.csv', 'r') as file:
    reader = csv.reader(file, delimiter=',')  #if you have a csv file use comma delimiter
    for row in reader:
        print (row)
Zia
  • 389
  • 1
  • 3
  • 17
0

This parsing error could occur for multiple reasons and solutions to the different reasons have been posted here as well as in Python Pandas Error tokenizing data.

I posted a solution to one possible reason for this error here: https://stackoverflow.com/a/43145539/6466550

Community
  • 1
  • 1
computerist
  • 872
  • 8
  • 9
0

I have had similar problems. With my csv files it occurs because they were created in R, so it has some extra commas and different spacing than a "regular" csv file.

I found that if I did a read.table in R, I could then save it using write.csv and the option of row.names = F.

I could not get any of the read options in pandas to help me.

BrianM
  • 48
  • 6
0

The problem could be that one or multiple rows of csv file contain more delimiters (commas ,) than expected. It is solved when each row matches the amount of delimiters of the first line of the csv file where the column names are defined.

jmish
  • 3
  • 3
0

use \t+ in the separator pattern instead of \t.

import pandas as pd

fname = "data.csv"
input_data = pd.read_csv(fname, sep='\t+`, header=None)

Derrick Kuria
  • 159
  • 1
  • 10