I'm working with a big CSV, trying to open it like this:
df = pd.read_csv('path', encoding='Windows-1251', sep=';', error_bad_lines=False)
I have error tokenizing data. I want to understand what's wrong with the lines. Jupyter shows me long list of errors like Skipping line 908585: expected 10 fields, saw 14
.
Is there any way to collect numbers of all bad lines to list?
I understand I can open CSV as textfile, separate it and find all lines where are more than 10 elements, but it doesn't completely fix my problem because there are around 20k lines which pd.read_csv
didn't read except this lines with 14 columns, and I don't see any other error and don't understand what's the problem with them.
This is why I need exactly these lines which pd.read_csv
shows me as bad.
upd: I know the reason of errors, the question is about forming the python list of these errors