1

Input csv file:

a,b,c,d,e
1,2,3,4,2
3,4,5,6,3
3,4,5
1,2

Code:

import numpy as np

data = np.genfromtxt("sa.csv", dtype=None, delimiter=',', names=True)
print data['a'],data['b'],data['e']

I ll get an error

Traceback (most recent call last):
  File "cs.py", line 3, in <module>
    data = np.genfromtxt("sa.csv", dtype=None, delimiter=',', names=True)
  File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 1593, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #4 (got 3 columns instead of 5)
    Line #5 (got 2 columns instead of 5)

How to deal with this and perform correlation based on the selected columns using statsmodels in python??

user2086122
  • 65
  • 4
  • 10
  • I've tidied up your post a little bit - although your traceback is still a bit wonky - you may wish to copy/paste and put in the trackback again. This time, paste it in, highlight the lot and press ctrl+k on it... Or the little `{}` icon when you click on [edit] – Jon Clements Feb 26 '13 at 11:12
  • sorry i waz new to this i would learn it – user2086122 Feb 26 '13 at 11:35
  • to calculate correlation, you can just use ``np.corrcoef``, or ``np.ma.corrcoef`` if you have missing values, or use pandas. statsmodels doesn't duplicate and doesn't have those functions. – Josef Feb 26 '13 at 14:39

3 Answers3

2

Since you mention statsmodels I assume that you've got its pandas dependency installed. Pandas will parse your example properly:

import pandas as pd
import numpy as np
dat = pd.read_csv('test.csv')
np.corrcoef(dat)

array([[ 1.        ,  0.94174191,         nan,         nan],
       [ 0.94174191,  1.        ,         nan,         nan],
       [        nan,         nan,         nan,         nan],
       [        nan,         nan,         nan,         nan]])

Which is correct given the missing values.

Vincent
  • 15,809
  • 7
  • 37
  • 39
0

You can make it ignore the lines with fewer columns (Using genfromtxt to import csv data with missing values in numpy), but you cannot make it parse them as incomplete. If you put the delimiters in place (i.e. 1,2,,,) it can work, but otherwise I don't think genfromtxt is smart enough to do what you want.

You could easily implement it yourself, though, using the csv module.

Community
  • 1
  • 1
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
0

In my case i had special character # inside my data what caused the problem. Example:

a,b#,c,d,e
1,2,3,4,2

Solution:

change the comments character, I changed it to: @@@

dataset = genfromtxt(open(file,'r'), delimiter=',', dtype='f8',
comments='@@@@')[1:]
MichaelLo
  • 1,289
  • 1
  • 14
  • 26