A lot of questions have been already asked about this topic on SO. (and many others). Among the numerous answers, none of them was really helpful to me so far. If I missed the useful one, please let me know.
I simply would like to read a CSV file with pandas into a dataframe. Sounds like a simple task.
My file Test.csv
1,2,3,4,5
1,2,3,4,5,6
,,3,4,5
1,2,3,4,5,6,7
,2,,4
My code:
import pandas as pd
df = pd.read_csv('Test.csv',header=None)
My error:
pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 2, saw 6
My guess about the issue is that Pandas looks to the first line and expects the same number of tokens in the following rows. If this is not the case it will stop with an error.
In the numerous answers, the suggestions for using options are, e.g.:
error_bad_lines=False
or header=None
or skiprows=3
and more non-helpful suggestions.
However, I don't want to ignore any lines or skip. And I don't know in advance how many columns and rows the datafile has.
So it basically boils down to how to find the maximum number of columns in the datafile. Is this the way to go? I hoped that there was an easy way to simply read a CSV file which does not have the maximum column number in the first line. Thank you for any hints. I'm using Python 3.6.3, Pandas 0.24.1 on Win7.