0

I'm working with a big CSV, trying to open it like this:

df = pd.read_csv('path', encoding='Windows-1251', sep=';', error_bad_lines=False)

I have error tokenizing data. I want to understand what's wrong with the lines. Jupyter shows me long list of errors like Skipping line 908585: expected 10 fields, saw 14.

Is there any way to collect numbers of all bad lines to list?

I understand I can open CSV as textfile, separate it and find all lines where are more than 10 elements, but it doesn't completely fix my problem because there are around 20k lines which pd.read_csv didn't read except this lines with 14 columns, and I don't see any other error and don't understand what's the problem with them.

This is why I need exactly these lines which pd.read_csv shows me as bad.

upd: I know the reason of errors, the question is about forming the python list of these errors

Faenno
  • 27
  • 8
  • Do the cells of the CSV file have `;` in them? I mean, not as the separator, but as actually the data itself? –  Dec 07 '21 at 17:44
  • The error itself is happening because there are too many semicolons (`;`) on line 908585. –  Dec 07 '21 at 17:46
  • 1
    Does this answer your question? [Pandas DataFrame Read Skipping line XXX: expected X fields, saw Y](https://stackoverflow.com/questions/43891391/pandas-dataframe-read-skipping-line-xxx-expected-x-fields-saw-y) – Wilian Dec 07 '21 at 17:53
  • 1
    ...error_bad_lines= False,quoting=csv.QUOTE_NONE) – Wilian Dec 07 '21 at 17:54
  • @Wilian `NameError: name 'csv' is not defined` – Faenno Dec 07 '21 at 18:11
  • import csv .... – Wilian Dec 07 '21 at 18:12
  • @Wilian it helped me to read almost all file, thank you! But I still want to know is it possible to collect errors from function to list/string or not :) – Faenno Dec 07 '21 at 18:17
  • 1
    try this solution, works here in my test: https://stackoverflow.com/a/59420728/16267793 – Wilian Dec 07 '21 at 18:25

0 Answers0