genfromtxt in python code for missing columns

Question

Input csv file:

a,b,c,d,e
1,2,3,4,2
3,4,5,6,3
3,4,5
1,2

Code:

import numpy as np

data = np.genfromtxt("sa.csv", dtype=None, delimiter=',', names=True)
print data['a'],data['b'],data['e']

I ll get an error

Traceback (most recent call last):
  File "cs.py", line 3, in <module>
    data = np.genfromtxt("sa.csv", dtype=None, delimiter=',', names=True)
  File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 1593, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #4 (got 3 columns instead of 5)
    Line #5 (got 2 columns instead of 5)

How to deal with this and perform correlation based on the selected columns using statsmodels in python??

I've tidied up your post a little bit - although your traceback is still a bit wonky - you may wish to copy/paste and put in the trackback again. This time, paste it in, highlight the lot and press ctrl+k on it... Or the little `{}` icon when you click on [edit] — Jon Clements, Feb 26 '13 at 11:12
to calculate correlation, you can just use ``np.corrcoef``, or ``np.ma.corrcoef`` if you have missing values, or use pandas. statsmodels doesn't duplicate and doesn't have those functions. — Josef, Feb 26 '13 at 14:39

Vincent · Answer 1 · 2014-01-18T02:09:06.620

Since you mention statsmodels I assume that you've got its pandas dependency installed. Pandas will parse your example properly:

import pandas as pd
import numpy as np
dat = pd.read_csv('test.csv')
np.corrcoef(dat)

array([[ 1.        ,  0.94174191,         nan,         nan],
       [ 0.94174191,  1.        ,         nan,         nan],
       [        nan,         nan,         nan,         nan],
       [        nan,         nan,         nan,         nan]])

Which is correct given the missing values.

score 0 · Answer 2 · edited May 23 '17 at 12:32

0

You can make it ignore the lines with fewer columns (Using genfromtxt to import csv data with missing values in numpy), but you cannot make it parse them as incomplete. If you put the delimiters in place (i.e. 1,2,,,) it can work, but otherwise I don't think genfromtxt is smart enough to do what you want.

You could easily implement it yourself, though, using the csv module.

edited May 23 '17 at 12:32

Community

1
1

answered Feb 26 '13 at 11:40

John Zwinck

239,568
38
324
436

It takes a negative value -1 – user2086122 Feb 26 '13 at 11:51
How to perform correlation for the two columns – user2086122 Feb 26 '13 at 11:52
That sounds like a second, unrelated question. I suggest you post it separately. – John Zwinck Feb 26 '13 at 11:53

score 0 · Answer 3 · answered Nov 23 '15 at 14:08

In my case i had special character # inside my data what caused the problem. Example:

a,b#,c,d,e
1,2,3,4,2

Solution:

change the comments character, I changed it to: @@@

dataset = genfromtxt(open(file,'r'), delimiter=',', dtype='f8',
comments='@@@@')[1:]

genfromtxt in python code for missing columns

3 Answers3