ParserError: Error tokenizing data. C error: Expected 2503 fields in line 2624, saw 52523

Question

I use pandas read_csv function to read my csv file.

feature_file_df_5=pd.read_csv('/home/jayashree/Documents/Nokia/DataSet/SMT Data Analytics/SPI (Solder Paste Inspection)/086990A-108-FHFB-TRX-985676H-BOTTOM-N_0608_2001_2500.csv',header=501)

I am facing parser error

/home/jayashree/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.pyc in read(self, nrows)
   1717     def read(self, nrows=None):
   1718         try:
-> 1719             data = self._reader.read(nrows)
   1720         except StopIteration:
   1721             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11138)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:11884)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows (pandas/_libs/parsers.c:11755)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error (pandas/_libs/parsers.c:28765)()

ParserError: Error tokenizing data. C error: Expected 2503 fields in line 2624, saw 52523

Based on suggestions from this thread I tried adding sep option as

feature_file_df_5=pd.read_csv('/home/jayashree/Documents/Nokia/DataSet/SMT Data Analytics/SPI (Solder Paste Inspection)/086990A-108-FHFB-TRX-985676H-BOTTOM-N_0608_2001_2500.csv', sep=',',header=501)

STill getting same error when I used sep=None

`feature_file_df_5=pd.read_csv('/home/jayashree/Documents/Nokia/DataSet/SMT Data Analytics/SPI (Solder Paste Inspection)/086990A-108-FHFB-TRX-985676H-BOTTOM-N_0608_2001_2500.csv', sep=None,header=`501)

I am getting this error

/home/jayashree/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.pyc in _rows_to_cols(self, content)
   2782                 msg = ('Expected %d fields in line %d, saw %d' %
   2783                        (col_len, row_num + 1, actual_len))
-> 2784                 if len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
   2785                     # see gh-13374
   2786                     reason = ('Error could possibly be due to quotes being '

TypeError: object of type 'NoneType' has no len()


  [1]: https://stackoverflow.com/questions/18039057/python-pandas-error-tokenizing-data

On opening in spreadsheet,I could not find any problem all rows are present. How to resolve the error.

score 0 · Answer 1 · answered Oct 03 '17 at 16:05

You should possibly experiment with parameters quoting and quotechar which can help with file fields structurizing. More details here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Or maybe if there is only one (or few) broken rows which can be omitted, use error_bad_lines=False.

ParserError: Error tokenizing data. C error: Expected 2503 fields in line 2624, saw 52523

1 Answers1