How to return a view of several columns in numpy structured array

Question

I can see several columns (fields) at once in a numpy structured array by indexing with a list of the field names, for example

import numpy as np

a = np.array([(1.5, 2.5, (1.0,2.0)), (3.,4.,(4.,5.)), (1.,3.,(2.,6.))],
        dtype=[('x',float), ('y',float), ('value',float,(2,2))])

print a[['x','y']]
#[(1.5, 2.5) (3.0, 4.0) (1.0, 3.0)]

print a[['x','y']].dtype
#[('x', '<f4') ('y', '<f4')])

But the problem is that it seems to be a copy rather than a view:

b = a[['x','y']]
b[0] = (9.,9.)

print b
#[(9.0, 9.0) (3.0, 4.0) (1.0, 3.0)]

print a[['x','y']]
#[(1.5, 2.5) (3.0, 4.0) (1.0, 3.0)]

If I only select one column, it's a view:

c = x['y']
c[0] = 99.

print c
#[ 99.  4.   3. ]

print a['y']
#[ 99.  4.   3. ]

Is there any way I can get the view behavior for more than one column at once?

I have two workarounds, one is to just loop through the columns, the other is to create a hierarchical dtype, so that the one column actually returns a structured array with the two (or more) fields that I want. Unfortunately, zip also returns a copy, so I can't do:

x = a['x']; y = a['y']
z = zip(x,y)
z[0] = (9.,9.)

An update for anyone living in the present - `a[['x','y']]` now _does_ return a view. — Eric, Jan 10 '20 at 18:55

score 36 · Accepted Answer · edited Dec 24 '17 at 11:26

36

You can create a dtype object contains only the fields that you want, and use numpy.ndarray() to create a view of original array:

import numpy as np
strc = np.zeros(3, dtype=[('x', int), ('y', float), ('z', int), ('t', "i8")])

def fields_view(arr, fields):
    dtype2 = np.dtype({name:arr.dtype.fields[name] for name in fields})
    return np.ndarray(arr.shape, dtype2, arr, 0, arr.strides)

v1 = fields_view(strc, ["x", "z"])
v1[0] = 10, 100

v2 = fields_view(strc, ["y", "z"])
v2[1:] = [(3.14, 7)]

v3 = fields_view(strc, ["x", "t"])

v3[1:] = [(1000, 2**16)]

print(strc)

here is the output:

[(10, 0.0, 100, 0L) (1000, 3.14, 7, 65536L) (1000, 3.14, 7, 65536L)]

edited Dec 24 '17 at 11:26

Guillaume Jacquenot

11,217
6
43
49

answered Feb 17 '14 at 01:36

HYRY

94,853
25
187
187

2

Ooh, this is nice, works for non-contiguous (or irregularly spaced) fields. – askewchan Feb 17 '14 at 18:24
New answer to an old question, @andy, doesn't get much attention that way :P – askewchan Feb 25 '14 at 03:51
This fails if the array has any columns not supported by `memoryview`, such as datetime64. See https://github.com/numpy/numpy/issues/4983 for details. – John Zwinck Apr 24 '17 at 06:33
The result is not a numpy array anymore but a dict and won't be usable with many numpy functionalities like broadcasting. – Camion May 13 '19 at 02:13

score 11 · Answer 2 · answered Aug 26 '16 at 04:32

11

Building on @HYRY's answer, you could also use ndarray's method getfield:

def fields_view(array, fields):
    return array.getfield(numpy.dtype(
        {name: array.dtype.fields[name] for name in fields}
    ))

answered Aug 26 '16 at 04:32

ChristopherC

1,635
16
31

Nice, how did we miss that? Seems cleaner and saver to avoid passing the data and strides. [`getfield`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.getfield.html) does not appear to be a [new](https://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.ndarray.getfield.html) method, though I've never seen it before. – askewchan Aug 29 '16 at 14:45
The result is not a numpy array anymore but a dict and won't be usable with many numpy functionalities like broadcasting. – Camion May 13 '19 at 02:13

score 7 · Answer 3 · edited Feb 16 '22 at 14:19

7

As of Numpy version 1.16, the code you propose will return a view. See 'NumPy 1.16.0 Release Notes->Future Changes->multi-field views return a view instead of a copy' on this page:

https://numpy.org/doc/stable/release/1.16.0-notes.html#multi-field-views-return-a-view-instead-of-a-copy

edited Feb 16 '22 at 14:19

mtazzari

451
1
5
14

answered Mar 20 '17 at 09:35

Anders Lindstrom

185
2
4

4

Only in numpy 1.16 was this feature implemented – panda-34 Mar 07 '19 at 15:38
1

This should be the accepted answer in 2020. – Davor Cubranic Feb 17 '21 at 15:34

score 5 · Answer 4 · answered Mar 03 '13 at 07:47

I don't think there is an easy way to achieve what you want. In general, you cannot take an arbitrary view into an array. Try the following:

>>> a
array([(1.5, 2.5, [[1.0, 2.0], [1.0, 2.0]]),
       (3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
       (1.0, 3.0, [[2.0, 6.0], [2.0, 6.0]])], 
      dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])
>>> a.view(float)
array([ 1.5,  2.5,  1. ,  2. ,  1. ,  2. ,  3. ,  4. ,  4. ,  5. ,  4. ,
        5. ,  1. ,  3. ,  2. ,  6. ,  2. ,  6. ])

The float view of your record array shows you how the actual data is stored in memory. A view into this data has to be expressible as a combination of a shape, strides and offset into the above data. So if you wanted, for instance, a view of 'x' and 'y' only, you could do the following:

>>> from numpy.lib.stride_tricks import as_strided
>>> b = as_strided(a.view(float), shape=a.shape + (2,),
                   strides=a.strides + a.view(float).strides)
>>> b
array([[ 1.5,  2.5],
       [ 3. ,  4. ],
       [ 1. ,  3. ]])

The as_strided does the same as the perhaps easier to understand:

>>> bb = a.view(float).reshape(a.shape + (-1,))[:, :2]
>>> bb
array([[ 1.5,  2.5],
       [ 3. ,  4. ],
       [ 1. ,  3. ]])

Either of this is a view into a:

>>> b[0,0] =0
>>> a
array([(0.0, 2.5, [[0.0, 2.0], [1.0, 2.0]]),
       (3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
       (1.0, 3.0, [[2.0, 6.0], [2.0, 6.0]])], 
      dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])
>>> bb[2, 1] = 0
>>> a
array([(0.0, 2.5, [[0.0, 2.0], [1.0, 2.0]]),
       (3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
       (1.0, 0.0, [[2.0, 6.0], [2.0, 6.0]])], 
      dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])

It would be nice if either of this could be converted into a record array, but numpy refuses to do so, the reason not being all that clear to me:

>>> b.view([('x',float), ('y',float)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: new type not compatible with array.

Of course what works (sort of) for 'x' and 'y' would not work, for instance, for 'x' and 'value', so in general the answer is: it cannot be done.

xyzzyqed · Answer 5 · 2020-05-06T17:16:27.857

In my case 'several columns' happens to be equal to two columns of the same data type, where I can use the following function to make a view:

def make_view(arr, fields, dtype):
    offsets = [arr.dtype.fields[f][1] for f in fields]
    offset = min(offsets)
    stride = max(offsets)
    return np.ndarray((len(arr), 2), buffer=arr, offset=offset, strides=(arr.strides[0], stride-offset), dtype=dtype)

I think this boils down the the same thing @Jamie said, it cannot be done in general, but for two columns of the same dtype it can. The result of this function is not a dict but a good old fashioned numpy array.

How to return a view of several columns in numpy structured array

5 Answers5

Linked