missing data in pandas read_csv

Question

my data:

a,b,c,d,e,f
1.5,4.8,,6.3
1.60,5.2,6.5,7.2
1.70,5.5,6.6,8.3,5.7
1.80,6.1,6.7,9.7,6.2
1.90,7.1,6.8,11.1,6.7
2,,6.8,12.5,7.3
2.08,,,,7.8
2.1,,7.2
2.2,,8.0
2.3,,8.7
2.4,,9.2,8.2

from pandas import read_csv
ds = read_csv ('lin-nan.dat', index_col=0, sep=',')

Traceback (most recent call last):
  File "read_lin.py", line 7, in <module>
    ds = read_csv ('lin-nan.dat', index_col=0, sep=',')
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 253, in read_csv
    return _read(TextParser, filepath_or_buffer, kdict)
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 202, in _read
    return parser.get_chunk()
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 844, in get_chunk
    alldata = self._rows_to_cols(content)
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 809, in _rows_to_cols
    raise ValueError(msg)
ValueError: Expecting 6 columns, got 5 in row 1

score 1 · Answer 1 · answered Mar 21 '13 at 01:31

1

You can use the error_bad_lines=False option of the read_csv function. It will automatically skip the badly formatted lines and print them.

answered Mar 21 '13 at 01:31

sebastibe

586
3
13

score 0 · Answer 2 · answered Nov 19 '12 at 13:50

The problem is that you don't have any columns of length 6 (the longest is 5), I don't think there is a keyword in read_csv to overcome this.

One solution is to be more explicit:

In [1]: df = pd.read_csv('lin-nan.dat', names=list('abcde'), index_col=0, skiprows=1)

In [2]: df['f'] = np.nan

In [3]: df
Out[3]: 
        b    c     d    e   f
a                            
1.50  4.8  NaN   6.3  NaN NaN
1.60  5.2  6.5   7.2  NaN NaN
1.70  5.5  6.6   8.3  5.7 NaN
1.80  6.1  6.7   9.7  6.2 NaN
1.90  7.1  6.8  11.1  6.7 NaN
2.00  NaN  6.8  12.5  7.3 NaN
2.08  NaN  NaN   NaN  7.8 NaN
2.10  NaN  7.2   NaN  NaN NaN
2.20  NaN  8.0   NaN  NaN NaN
2.30  NaN  8.7   NaN  NaN NaN
2.40  NaN  9.2   8.2  NaN NaN

missing data in pandas read_csv

my data:

2 Answers2

Linked