The following code:
import pandas as pd
from StringIO import StringIO
data = StringIO("""a,b,c
1,2,3
4,5,6
6,7,8,9
1,2,5
3,4,5""")
pd.read_csv(data, warn_bad_lines=True, error_bad_lines=False)
produces this output:
Skipping line 4: expected 3 fields, saw 4
a b c
0 1 2 3
1 4 5 6
2 1 2 5
3 3 4 5
That is, third line is rejected because it contains four (and not the expected three) values. This csv datafile is considered to be malformed.
What if I wanted instead a different behavior, i.e. not skipping lines having more fields than expected, but keeping their values by using a larger dataframe.
In the given example this would be the behavior ('UNK' is just an example, might be any other string):
a b c UNK
0 1 2 3 nan
1 4 5 6 nan
2 6 7 8 9
3 1 2 5 nan
4 3 4 5 nan
This is just an example in which there is only one additional value, what about an arbitrary (and a priori unknown) number of fields? Is this obtainable by some way through pandas read_csv
?
Please note: I can do this by using csv.reader
, I am just trying to switch now to pandas.
Any help/hints is appreciated.