29

I can see several columns (fields) at once in a numpy structured array by indexing with a list of the field names, for example

import numpy as np

a = np.array([(1.5, 2.5, (1.0,2.0)), (3.,4.,(4.,5.)), (1.,3.,(2.,6.))],
        dtype=[('x',float), ('y',float), ('value',float,(2,2))])

print a[['x','y']]
#[(1.5, 2.5) (3.0, 4.0) (1.0, 3.0)]

print a[['x','y']].dtype
#[('x', '<f4') ('y', '<f4')])

But the problem is that it seems to be a copy rather than a view:

b = a[['x','y']]
b[0] = (9.,9.)

print b
#[(9.0, 9.0) (3.0, 4.0) (1.0, 3.0)]

print a[['x','y']]
#[(1.5, 2.5) (3.0, 4.0) (1.0, 3.0)]

If I only select one column, it's a view:

c = x['y']
c[0] = 99.

print c
#[ 99.  4.   3. ]

print a['y']
#[ 99.  4.   3. ]

Is there any way I can get the view behavior for more than one column at once?

I have two workarounds, one is to just loop through the columns, the other is to create a hierarchical dtype, so that the one column actually returns a structured array with the two (or more) fields that I want. Unfortunately, zip also returns a copy, so I can't do:

x = a['x']; y = a['y']
z = zip(x,y)
z[0] = (9.,9.)
askewchan
  • 45,161
  • 17
  • 118
  • 134
  • 5
    An update for anyone living in the present - `a[['x','y']]` now _does_ return a view. – Eric Jan 10 '20 at 18:55

5 Answers5

36

You can create a dtype object contains only the fields that you want, and use numpy.ndarray() to create a view of original array:

import numpy as np
strc = np.zeros(3, dtype=[('x', int), ('y', float), ('z', int), ('t', "i8")])

def fields_view(arr, fields):
    dtype2 = np.dtype({name:arr.dtype.fields[name] for name in fields})
    return np.ndarray(arr.shape, dtype2, arr, 0, arr.strides)

v1 = fields_view(strc, ["x", "z"])
v1[0] = 10, 100

v2 = fields_view(strc, ["y", "z"])
v2[1:] = [(3.14, 7)]

v3 = fields_view(strc, ["x", "t"])

v3[1:] = [(1000, 2**16)]

print(strc)

here is the output:

[(10, 0.0, 100, 0L) (1000, 3.14, 7, 65536L) (1000, 3.14, 7, 65536L)]
Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
HYRY
  • 94,853
  • 25
  • 187
  • 187
  • 2
    Ooh, this is nice, works for non-contiguous (or irregularly spaced) fields. – askewchan Feb 17 '14 at 18:24
  • New answer to an old question, @andy, doesn't get much attention that way :P – askewchan Feb 25 '14 at 03:51
  • This fails if the array has any columns not supported by `memoryview`, such as datetime64. See https://github.com/numpy/numpy/issues/4983 for details. – John Zwinck Apr 24 '17 at 06:33
  • The result is not a numpy array anymore but a dict and won't be usable with many numpy functionalities like broadcasting. – Camion May 13 '19 at 02:13
11

Building on @HYRY's answer, you could also use ndarray's method getfield:

def fields_view(array, fields):
    return array.getfield(numpy.dtype(
        {name: array.dtype.fields[name] for name in fields}
    ))
ChristopherC
  • 1,635
  • 16
  • 31
  • Nice, how did we miss that? Seems cleaner and saver to avoid passing the data and strides. [`getfield`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.getfield.html) does not appear to be a [new](https://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.ndarray.getfield.html) method, though I've never seen it before. – askewchan Aug 29 '16 at 14:45
  • The result is not a numpy array anymore but a dict and won't be usable with many numpy functionalities like broadcasting. – Camion May 13 '19 at 02:13
7

As of Numpy version 1.16, the code you propose will return a view. See 'NumPy 1.16.0 Release Notes->Future Changes->multi-field views return a view instead of a copy' on this page:

https://numpy.org/doc/stable/release/1.16.0-notes.html#multi-field-views-return-a-view-instead-of-a-copy

mtazzari
  • 451
  • 1
  • 5
  • 14
5

I don't think there is an easy way to achieve what you want. In general, you cannot take an arbitrary view into an array. Try the following:

>>> a
array([(1.5, 2.5, [[1.0, 2.0], [1.0, 2.0]]),
       (3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
       (1.0, 3.0, [[2.0, 6.0], [2.0, 6.0]])], 
      dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])
>>> a.view(float)
array([ 1.5,  2.5,  1. ,  2. ,  1. ,  2. ,  3. ,  4. ,  4. ,  5. ,  4. ,
        5. ,  1. ,  3. ,  2. ,  6. ,  2. ,  6. ])

The float view of your record array shows you how the actual data is stored in memory. A view into this data has to be expressible as a combination of a shape, strides and offset into the above data. So if you wanted, for instance, a view of 'x' and 'y' only, you could do the following:

>>> from numpy.lib.stride_tricks import as_strided
>>> b = as_strided(a.view(float), shape=a.shape + (2,),
                   strides=a.strides + a.view(float).strides)
>>> b
array([[ 1.5,  2.5],
       [ 3. ,  4. ],
       [ 1. ,  3. ]])

The as_strided does the same as the perhaps easier to understand:

>>> bb = a.view(float).reshape(a.shape + (-1,))[:, :2]
>>> bb
array([[ 1.5,  2.5],
       [ 3. ,  4. ],
       [ 1. ,  3. ]])

Either of this is a view into a:

>>> b[0,0] =0
>>> a
array([(0.0, 2.5, [[0.0, 2.0], [1.0, 2.0]]),
       (3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
       (1.0, 3.0, [[2.0, 6.0], [2.0, 6.0]])], 
      dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])
>>> bb[2, 1] = 0
>>> a
array([(0.0, 2.5, [[0.0, 2.0], [1.0, 2.0]]),
       (3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
       (1.0, 0.0, [[2.0, 6.0], [2.0, 6.0]])], 
      dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])

It would be nice if either of this could be converted into a record array, but numpy refuses to do so, the reason not being all that clear to me:

>>> b.view([('x',float), ('y',float)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: new type not compatible with array.

Of course what works (sort of) for 'x' and 'y' would not work, for instance, for 'x' and 'value', so in general the answer is: it cannot be done.

Jaime
  • 65,696
  • 17
  • 124
  • 159
0

In my case 'several columns' happens to be equal to two columns of the same data type, where I can use the following function to make a view:

def make_view(arr, fields, dtype):
    offsets = [arr.dtype.fields[f][1] for f in fields]
    offset = min(offsets)
    stride = max(offsets)
    return np.ndarray((len(arr), 2), buffer=arr, offset=offset, strides=(arr.strides[0], stride-offset), dtype=dtype)

I think this boils down the the same thing @Jamie said, it cannot be done in general, but for two columns of the same dtype it can. The result of this function is not a dict but a good old fashioned numpy array.

xyzzyqed
  • 472
  • 4
  • 12