1

I obtained a NumPy record ndarray from a CSV file using

data = matplotlib.mlab.csv2rec('./data.csv', delimiter=b',')

The data set is structured as:

      date,a0,a1,a2,a3, b0, b1, b2, b3,[...], b9
2012-01-01, 1, 2, 3, 4,0.1,0.2,0.3,0.4,[...],0.9

I want to select (in the SQL sense) just columns b0 through b9 from the array, giving the structure

 b0, b1, b2, b3,[...], b9
0.1,0.2,0.3,0.4,[...],0.9

The question "How can I use numpy array indexing to select 2 columns out of a 2D array to select unique values from?" is similar, but slicing data[:,5:] as suggested throws IndexError: too many indices with a record array.

Community
  • 1
  • 1
Mechanical snail
  • 29,755
  • 14
  • 88
  • 113

3 Answers3

1

data[...,0:3] will give you columns 0 through 2.

data[...,[0,2,3]] will give you columns 0, 2 and 3.

The thing is that you have an array of arrays, while the question you referenced is about 2D-arrays, which is slightly different. See also: Numpy Array Column Slicing Produces IndexError: invalid index Exception

Community
  • 1
  • 1
kirelagin
  • 13,248
  • 2
  • 42
  • 57
  • `data[..., 0:3]` gives me *rows* 0 through 2. – Mechanical snail May 31 '13 at 15:41
  • 1
    @Mechanicalsnail yes, as mentioned in the answer, you actually have a 1D array. Use information on the link provided to create a view, or start using [pandas](http://pandas.pydata.org) to manipulate dataframe like this more easily – Zeugma May 31 '13 at 15:44
  • @Mechanicalsnail I've double-checked. For me it works, and gives _columns_ 0 through 2. – kirelagin May 31 '13 at 15:46
  • @kirelagin: It seems the behavior is different between record arrays and regular arrays. – Mechanical snail May 31 '13 at 15:50
  • Ahh, right, I see. You basically have a 1-D array of tuples. I'm afraid in this case `numpy`'s indexing can't help you… – kirelagin May 31 '13 at 15:56
  • do `data = np.asarray(data)` and it will convert your array of tuples -> a 2D array, but I think there is a nicer way to deal with this using the fact that you have a numpy record object. – tacaswell May 31 '13 at 17:30
1

Given that you have an record array, I think the following will work:

data[['b' + str(j) for j in range(10)]]

doc/introduction and cookbook

tacaswell
  • 84,579
  • 22
  • 210
  • 199
  • I had tried `data.dtype.names[5:]`, which says `ValueError: invalid literal for long() with base 10`. It turns out that you have to provide a list, not a tuple as `.dtype.names` gives. So `data[list(data.dtype.names[5:])]`—basically what you've wrote—works as desired. – Mechanical snail May 31 '13 at 20:35
1

I know the question has been answered, but just wanted to log this, as it is related - it's something between Extracting specific columns in numpy array and Select Rows from Numpy Rec Array (but not quite How to return a view of several columns in numpy structured array ), this is a syntax I was looking for a while, and I finally found it; let's say this is the data:

import numpy as np

a = np.array([(1.5, 2.5, (1.0,2.0)), (3.,4.,(4.,5.)), (1.,3.,(2.,6.))],
        dtype=[('x',float), ('y',float), ('value',float,(2,2))])

I want something like SQL SELECT x,value FROM a WHERE y>=3.0 - that is, selecting only certain columns by field names; and only some rows according to some criteria; and the right syntax for that would be:

a[['x','value']][a['y']>=3.0]
# [(3.0, [[4.0, 5.0], [4.0, 5.0]]) (1.0, [[2.0, 6.0], [2.0, 6.0]])]

While a[a['y']>=3.0] works fine, note that:

>>> print a[a['y']>=3.0]['x','value']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.

... however, if extra pair of brackets are added, as in a[a['y']>=3.0][['x','value']] - all seems to work again.

Hope this helps someone,
Cheers!

Community
  • 1
  • 1
sdaau
  • 36,975
  • 46
  • 198
  • 278