Completely nesting NumPy structured scalars

Question

In the NumPy docs and in other StackOverflow questions, nested NumPy structured scalars are mentioned. Everywhere I've seen this, they seem to describe a nested structured scalar as a scalar which contains another scalar (obviously), but the inner scalar is always of another dtype. What I'd like to do is be able to have a NumPy dtype which has as one of it's fields, it's own dtype.

A simple example of this would be a dtype to represent a tree node, where it would store some value (like an integer) and another tree node representing it's parent.

It seems this should be done using numpy.void, but I've been unable to do it using a dtype like the following:

node_dtype = np.dtype([("parent", np.void), ("info", np.uint8)])

Having mixed types is actually [well documented](https://docs.scipy.org/doc/numpy/user/basics.rec.html), and super awesome. — Sam Ragusa, May 12 '18 at 10:11
I checked the `structured-array` tag after posting that comment and noticed that I have a big blind spot. Thanks :) — roganjosh, May 12 '18 at 10:13

score 2 · Accepted Answer · answered May 12 '18 at 16:59

np.void

I suppose you thought np.void would work since the type of a structured array record is void:

In [32]: node_dtype = np.dtype([("parent", np.void), ("info", np.uint8)])
In [33]: np.zeros(3, node_dtype)
Out[33]: 
array([(b'', 0), (b'', 0), (b'', 0)],
      dtype=[('parent', 'V'), ('info', 'u1')])
In [34]: type(_[0])
Out[34]: numpy.void

But notice that

In [35]: __['parent']
Out[35]: array([b'', b'', b''], dtype='|V0')

That field occupies 0 bytes.

In [36]: np.zeros(3, np.void)
Out[36]: array([b'', b'', b''], dtype='|V0')
In [37]: np.zeros(3, np.void(0))
Out[37]: array([b'', b'', b''], dtype='|V0')
In [38]: np.zeros(3, np.void(5))
Out[38]: 
array([b'\x00\x00\x00\x00\x00', b'\x00\x00\x00\x00\x00',
       b'\x00\x00\x00\x00\x00'], dtype='|V5')
In [39]: _[0] = b'12345'

np.void normally takes an argument, an integer specifying the length.

While it is possible to nest dtypes, the result must still have a known itemsize:

In [57]: dt0 = np.dtype('i,f')
In [58]: dt1 = np.dtype([('f0','U3'), ('nested',dt0)])
In [59]: dt1
Out[59]: dtype([('f0', '<U3'), ('nested', [('f0', '<i4'), ('f1', '<f4')])])
In [60]: dt1.itemsize
Out[60]: 20

The resulting array will have a know size data buffer, just enough to hold arr.size items of arr.itemsize bytes.

object dtype

You can construct a structured array with object dtype fields

In [61]: arr = np.empty(3, 'O,i')
In [62]: arr
Out[62]: 
array([(None, 0), (None, 0), (None, 0)],
      dtype=[('f0', 'O'), ('f1', '<i4')])
In [63]: arr[1]['f0']=arr[0]
In [64]: arr[2]['f0']=arr[1]
In [65]: arr
Out[65]: 
array([(None, 0), ((None, 0), 0), (((None, 0), 0), 0)],
      dtype=[('f0', 'O'), ('f1', '<i4')])
In [66]: arr[0]['f1']=100
In [67]: arr
Out[67]: 
array([(None, 100), ((None, 100),   0), (((None, 100), 0),   0)],
      dtype=[('f0', 'O'), ('f1', '<i4')])
In [68]: arr[1]['f1']=200
In [69]: arr[2]['f1']=300
In [70]: arr
Out[70]: 
array([(None, 100), ((None, 100), 200), (((None, 100), 200), 300)],
      dtype=[('f0', 'O'), ('f1', '<i4')])

I don't know if this would a particularly useful structure or not. A list might just as good

In [71]: arr.tolist()
Out[71]: [(None, 100), ((None, 100), 200), (((None, 100), 200), 300)]

Thank you for your awesome explanation, I wouldn't have thought to use the np.object type as a structured array field! If you were curious, my use for this structure is representing a game tree for a [Numba compiled chess engine](https://github.com/SamRagusa/Batch-First), and aside from holding the node's info in a structured scalar (the non-object field), it will only ever be used to check or update a value of it's parent, and this will always be done in batches. — Sam Ragusa, May 12 '18 at 20:32

Paul Panzer · Answer 2 · 2018-05-12T12:19:25.990

Trying this crashed numpy for me:

>>> import numpy as np
>>>
# normal compound dtype, no prob
>>> L = [('f1', int), ('f2', float), ('f3', 'U4')]
>>> np.dtype(L)
dtype([('f1', '<i8'), ('f2', '<f8'), ('f3', '<U4')])
>>> 
# dtype containing itself
>>> L.append(('f4', L))
>>> L
[('f1', <class 'int'>), ('f2', <class 'float'>), ('f3', 'U4'), ('f4', [...])]
>>> np.dtype(L)
Speicherzugriffsfehler (Speicherabzug geschrieben)
# and that is German for segfault (core dumped)

Considering the conceptual problems in interpreting this structure, let alone automatically coming up with a memory layout for it, I'm not surprised it doesn't work, though, obviously, it shouldn't crash.

What surprised me most is how many letters it took to say "segfault (core dumped)" in German. — Sam Ragusa, May 14 '18 at 07:33

Paul Panzer · Answer 3 · 2018-05-13T00:22:44.880

1

I couldn't help playing with @hpaulj's very neat solution.

There is one thing that bit me which I feel is useful to know.

It doesn't work --- or at least doesn't work the same --- in bulk:

>>> import numpy as np
>>> 
>>> arr = np.empty(4, 'O,i')
>>> arr['f1'] = np.arange(4)
>>> 
# assign one by one:
# ------------------
>>> for i in range(4): arr[i]['f0'] = arr[(i+1) % 4]
... 
# inddividual elements link up nicely:
>>> arr[0]['f0']['f0'] is arr[1]['f0']
True
>>> print([(a['f1'], a['f0']['f1'], a['f0']['f0']['f1']) for a in arr])
[(0, 1, 2), (1, 2, 3), (2, 3, 0), (3, 0, 1)]
# but don't try it in bulk:
>>> print(arr['f1'], arr['f0']['f1'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
>>> 
>>> arr = np.empty(4, 'O,i')
>>> arr['f1'] = np.arange(4)
>>> 
# assign in bulk:
# ---------------
>>> arr['f0'][[3,0,1,2]] = arr
>>> 
# no linking up:
>>> arr[0]['f0']['f0'] is arr[1]['f0']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: tuple indices must be integers or slices, not str
>>> print([(a['f1'], a['f0']['f1'], a['f0']['f0']['f1']) for a in arr])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
TypeError: tuple indices must be integers or slices, not str

edited May 13 '18 at 00:22

answered May 13 '18 at 00:12

Paul Panzer

51,835
3
54
99

The linking is the key issue, and I believe it is referenced in the docs [here](https://docs.scipy.org/doc/numpy/user/basics.rec.html#viewing-structured-arrays-containing-objects), it also seems the errors you're having come from NumPy not liking your indexing. It seems you can index the nested scalars with integers instead of strings to avoid the errors you're having. Also try setting arr[0]['f0]=arr[3] and arr[1]['f0]=arr[3], then change the value of arr[0]['f0'][1]. While the "is" ownership test fails, the values are updated in the way you'd think. – Sam Ragusa May 13 '18 at 01:45
I don't think my use case will actually be affected by this limitation! Since the parent node for any node will never change, only the values contained within it will. – Sam Ragusa May 13 '18 at 01:51
@SamRagusa that docs bit is a nice find! Still, the difference between the two cases is genuine: When assigning one-by-one you can ridiculous things like `arr[0]['f0']['f0']['f0']['f0']['f0']['f0']['f0']['f0']['f0'] is arr[0]['f0']` -> `True` whereas in the bulk case even using integer indexing we will always have finite depth: `arr[0][0][0]` -> `None`, `arr[0][0][0][0]` -> error. - But if your use case isn't affected, all the better. – Paul Panzer May 13 '18 at 03:06
What error are you referring to? You can even do `b=arr[0]`, `b[0] = b` and then endlessly loop incrementing the same node value. The error I've been getting is a RecursionError, and it only seems to happen when printing the array or scalar, not when using them. – Sam Ragusa May 13 '18 at 03:32
@SamRagusa I mean the second case, i.e. start with a fresh empty array `arr = np.empty(4, 'O,i')`, `arr['f1'] = range(4)` and then do the array assignment `arr['f0'][[3,0,1,2]] = arr`. This `arr` you can still print no prob. Because while its elements are nested they are only nested one level. Which is also why `arr[0][0][0][0]` doesn't work. You can do `arr['f0'][[3,0,1,2]] = arr` repeatedly; each time will add one layer of nesting. In contrast, `for i in range(4): arr[i]['f0'] = arr[(i+1) % 4]` because it creates true references will immediately create an infinite loop. – Paul Panzer May 13 '18 at 04:26

Completely nesting NumPy structured scalars

3 Answers3

np.void

object dtype