I have a script that reads some measurement data in CSV form, and then does all kinds of plotting and stuff with it.
Now I have a new dataset, where some idiot deemed it helpful to add some random comments at the end of the line, like so:
01.02.1988 00:00:00 ; 204.94
01.03.1988 00:00:00 ; 204.87 ; something
01.04.1988 00:00:00 ; 205.41
01.05.1988 00:00:00 ; 205.64 ; something ; something else
01.06.1988 00:00:00 ; 205.59 ; also something
01.07.1988 00:00:00 ; 205.24
which gives me a nice
ValueError: Expected 2 fields in line 36, saw 3
and so on.
According to this and this I have to use the names=['whatever','else']
argument when reading it.
But somehow this goes all kinds of wrong. So here's some examples:
CSV file
Stuff
more stuff I dont need
Date;level;crap1;crap2;crap3;crap4;crap5;crap6
01.01.1988 00:00:00 ; 204.87
01.02.1988 00:00:00 ; 204.94
01.03.1988 00:00:00 ; 204.87
The "nice" header is obviously "handmade", but I should just be able to skip it!?
CSV reader
ValReader = pd.read_csv(csv_list[counter],sep=r'\s*;',skiprows=DateStart,names=['Date','level','crap1','crap2','crap3','crap4','crap5','crap6'],usecols=['Date','level'],index_col='Date',dayfirst=True,parse_dates=True)
What I get
print 'ValReader'
level
Date
Date level
01.04.2003 00:00:00 200.76
01.05.2003 00:00:00 200.64
01.06.2003 00:00:00 200.53
Which following that, causes level to get handled as string.
OK, easy, that manual header line in the CSV (which worked well in a previous version, that only had to handle good data) is the culprit, so I just set skiprows
to skiprows=DateStart+1
, but that results in
ValueError: Number of passed names did not match number of header fields in the file
So obviously I got utterly lost in how pandas handles the names and positions of columns.