1

The title may be a little bit confusing, so I hope I can make it clearer with the help of an example. Image I have a little helper function that adds new fields to already existing structured arrays:

import numpy as np


def add_field(a, *descr):
    b = np.empty(a.shape, dtype=a.dtype.descr + [*descr])
    for name in a.dtype.names:
        b[name] = a[name]
    return b

Given a structured array, I can simply use it to add new fields:

a = np.array(
    [(1, False), (2, False), (3, False), (4, True)],
    dtype=[('id', 'i4'), ('used', '?')]
)
print(a)
b = add_field(a, ('new', 'O'))
print(b)

I can then set an entry of the newly created field to an (empty) list without a problem:

b[0]['new'] = []

I can also create a new array which is only a slice of the original one and then add a new field to this new array:

c = a[0]
print(c)
d = add_field(c, ('newer', 'O'))
print(d)

BUT if I now try to set the new field to an (empty) list, it doesn't work:

d['newer'] = []

ValueError: assignment to 0-d array

Why is that? According to add_field, d is an entirely new array that happens to share the same fields and entries just like b did. Interestingly, the shape of b[0] is (), while the shape of d is (1,) (and also type(b) is np.void while type(d) is np.array). Maybe that has something to do with it? Also interestingly, all of this works:

d['newer'] = 1.34
d['newer'] = False
d['newer'] = None
d['newer'] = add_field
d['newer'] = set()
d['newer'] = {}
d['newer'] = {'test': []}

However, accessing the vaues in the last dict using the key 'test' does not:

>>> d['newer'] = {'test': []}
>>> d['newer']
>>> array({'test': []}, dtype=object)
>>> d['newer']['test']
>>> IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
>>> d['newer'][0]
>>> IndexError: too many indices for array

This is very confusing.

EDIT

Okay, I just tried to modify the add_field function like this:

def add_field(a, *descr):
    shape = a.shape if len(a.shape) else (1,)
    b = np.empty(shape, dtype=a.dtype.descr + [*descr])
    for name in a.dtype.names:
        b[name] = a[name]
    return b

But this didn't help:

>>> d = add_field(a[0], ('newer', 'O'))
>>> d
>>> array([(1, False, None)], dtype=[('id', '<i4'), ('used', '?'), ('test', 'O')])
>>> d.shape
>>> (1,)
>>> d['newer'] = []
>>> ValueError: cannot copy sequence with size 0 to array axis with dimension 1

So this was not it I guess. However this now works:

>>> d['newer'][0] = []

But I don't like this workaround. I would expect it to work the same as for b[0].

EDIT 2

If I modify the add_field function a little bit further, I can force the wanted behaviour, although I don't 100% like it:

def add_field(a, *descr):
    shape = a.shape if len(a.shape) else (1,)
    b = np.empty(shape, dtype=a.dtype.descr + [*descr])
    for name in a.dtype.names:
        b[name] = a[name]
    return b if len(a.shape) else b[0]

d = add_field(a[0], ('newer', 'O'))
d['newer'] = []
mapf
  • 1,906
  • 1
  • 14
  • 40
  • Does the same thing happen if you take a larger slice of the structured array? E.g. `a[:2]` – William Miller Jan 16 '20 at 09:02
  • You mean if `c = a[:2]`? In that case it works. I feel like it has something to do with the shape because `add_field` uses the shape of the input array and for some reason the shape of `a[0]` is `()` even though I guess it should be `(1,)`? – mapf Jan 16 '20 at 09:12
  • Yeah that’s what I meant.... I think the shape is the root of it, though I’ve no idea why `a[0]` is shape `()` – William Miller Jan 16 '20 at 09:13
  • What is `a.shape`? – William Miller Jan 16 '20 at 09:14
  • `a.shape` is `(4,)` as it should be I guess. See my edit where I tried to fix it. – mapf Jan 16 '20 at 09:19
  • I think the issue is when you take `a[0]` where `a.shape` is `(4,)` you're not taking a slice you're taking a single array element which should have shape `()` not `(1,)`.... shot in the dark but what does `c = [a[0]]` give you? – William Miller Jan 16 '20 at 09:22
  • In the edit have you checked that `b` is non-empty before `return b`? – William Miller Jan 16 '20 at 09:28
  • That doesn't work at all because `[a[0]]` is a list and therefore doesn't have the `shape` attribute. – mapf Jan 16 '20 at 09:28
  • > *In the edit have you checked that `b` is non-empty on return.* That was at least the intention. But I just noticed that the edit sort of works. With the edit, you can now do this: `d['newer'][0] = []`. I don't like it though. It should just behave as if the input array was longer. – mapf Jan 16 '20 at 09:32
  • Maybe it’s not supposed to have shape `(1,)`... what are `b.shape` and `b[0].shape` in the first example? – William Miller Jan 16 '20 at 09:34
  • `b.shape` is `(4,)` and `b[0].shape` is `()`. – mapf Jan 16 '20 at 09:38
  • Okay that is **weird**.... I'm out of ideas at the moment, will have to investigate further – William Miller Jan 16 '20 at 09:42
  • With the edit to correct the shape you can do `d[0]['newer'] = []` without error, so it seems to behave the same way as with `b[0]['new']` - though I’m not entirely sure why – William Miller Jan 16 '20 at 09:55
  • This actually gave me the idea for a workaround that is kinda ok but I don't 100% like it. See the 2nd edit. – mapf Jan 16 '20 at 10:03
  • 1
    When I tested it `d[0]['newer'] = []` worked without error - seems to me that `d[0]` is behaving like `b[0]` with that edit – William Miller Jan 16 '20 at 10:05
  • 2
    If `d.shape` is `()`, 0d, it can only be indexed `d[()]`. List assignment to an object dtype array can have broadcasting problems – hpaulj Jan 16 '20 at 10:20
  • 1
    Oh wow, I didn't even know `[()]` is a valid index! crazy. `d['newer'][()] = []`actually works with the original add_field version. I'm still very confused though. – mapf Jan 16 '20 at 10:31
  • I've summarized these comments and attempted to explain the behavior in an answer, hope it helps – William Miller Jan 17 '20 at 04:17

1 Answers1

1

To summarize the comments:

The issue in the original question appears to be the shape of the returned object - when you do e.g.

c = a[0]

with a having shape (n,) you are not taking a slice from the array but a single element. c.shape then is (). When you pass an array of shape () into add_field then the new array created by

b = np.empty(a.shape, dtype=a.dtype.descr + [*descr])

will also have shape (). However, it is necessary for a structured array to have shape (n,) (though it is not outlined in the documentation).

As in the first edit to the question, the correct modification would be

def add_field(a, *descr):
    shape = a.shape if len(a.shape) else (1,)
    b = np.empty(shape, dtype=a.dtype.descr + [*descr])
    b[list(a.dtype.names)] = a
    return b

The returned object will then share the properties of a shape (n,) structured array in that:

  1. If you index the array at an integer position you get a structure (e.g. d[0])
  2. You can access and modify individual fields of a structured array by indexing with the field name (e.g. d['newer'])

With the above modification the behavior of d in the question is the same as b e.g.

d[0]['newer'] = []

is valid, as is

b[0]['new'] = []

This brings us to the real crux of the question:


Why can't we assign an empty list to each element of a field using the d['newer']=[] syntax?

    When you assign an iterable instead of a scalar using this syntax, numpy attempts an element-wise assignment (or a broadcast depending on the iterable). This differs from the assignment of a scalar wherein the scalar is assigned to every element of that field. The documentation is not clear on this point, but we can get a much more helpful error by using

b['new'] = np.array([])
Traceback (most recent call last):
  File "structuredArray.py", line 20, in <module>
    b['new'] = np.array([])
ValueError: could not broadcast input array from shape (0) into shape (4)

So the issue here isn't how the field is being added, but how you are attempting to assign an empty list to each element of that field. The correct way to do this would be something like

b['new'] = [[]*b.shape[0]]

which works as expected for structured arrays of both (1,) and (4,) shape:

import numpy as np

def add_field(a, *descr):
    shape = a.shape if len(a.shape) else (1,)
    b = np.empty(shape, dtype=a.dtype.descr + [*descr])
    for name in a.dtype.names:
        b[name] = a[name]
    return b

a = np.array(
    [(1, False), (2, False), (3, False), (4, True)],
    dtype=[('id', 'i4'), ('used', '?')]
)

b = add_field(a, ('new', 'O'))
b['new'] = [[]*b.shape[0]]
print(b)

c = a[0]
d = add_field(c, ('newer', 'O'))
d['newer'] = [[]*d.shape[0]]
print(d)
[(1, False, list([])) (2, False, list([])) (3, False, list([])) (4,  True, list([]))]
[(1, False, list([]))]
William Miller
  • 9,839
  • 3
  • 25
  • 46
  • 1
    I just wanted to mention a small tweak I came up with which is probably a little bit faster. Instead of the for loop you can use `b[list(a.dtype.names)] = a` – mapf May 04 '20 at 14:28
  • 1
    @mapf That is definitely much more pythonic, I've included it in my answer. Thanks for the suggestion – William Miller May 05 '20 at 03:14