0

my data:


a,b,c,d,e,f
1.5,4.8,,6.3
1.60,5.2,6.5,7.2
1.70,5.5,6.6,8.3,5.7
1.80,6.1,6.7,9.7,6.2
1.90,7.1,6.8,11.1,6.7
2,,6.8,12.5,7.3
2.08,,,,7.8
2.1,,7.2
2.2,,8.0
2.3,,8.7
2.4,,9.2,8.2

from pandas import read_csv
ds = read_csv ('lin-nan.dat', index_col=0, sep=',')

Traceback (most recent call last):
  File "read_lin.py", line 7, in <module>
    ds = read_csv ('lin-nan.dat', index_col=0, sep=',')
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 253, in read_csv
    return _read(TextParser, filepath_or_buffer, kdict)
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 202, in _read
    return parser.get_chunk()
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 844, in get_chunk
    alldata = self._rows_to_cols(content)
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 809, in _rows_to_cols
    raise ValueError(msg)
ValueError: Expecting 6 columns, got 5 in row 1
Community
  • 1
  • 1
nbecker
  • 1,645
  • 5
  • 17
  • 23

2 Answers2

1

You can use the error_bad_lines=False option of the read_csv function. It will automatically skip the badly formatted lines and print them.

sebastibe
  • 586
  • 3
  • 13
0

The problem is that you don't have any columns of length 6 (the longest is 5), I don't think there is a keyword in read_csv to overcome this.

One solution is to be more explicit:

In [1]: df = pd.read_csv('lin-nan.dat', names=list('abcde'), index_col=0, skiprows=1)

In [2]: df['f'] = np.nan

In [3]: df
Out[3]: 
        b    c     d    e   f
a                            
1.50  4.8  NaN   6.3  NaN NaN
1.60  5.2  6.5   7.2  NaN NaN
1.70  5.5  6.6   8.3  5.7 NaN
1.80  6.1  6.7   9.7  6.2 NaN
1.90  7.1  6.8  11.1  6.7 NaN
2.00  NaN  6.8  12.5  7.3 NaN
2.08  NaN  NaN   NaN  7.8 NaN
2.10  NaN  7.2   NaN  NaN NaN
2.20  NaN  8.0   NaN  NaN NaN
2.30  NaN  8.7   NaN  NaN NaN
2.40  NaN  9.2   8.2  NaN NaN
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535