11

I have a simple list of elements and I'm trying to make a structured array out of it.

This naive approach fails:

y = np.array([1,2,3], dtype=[('y', float)])
TypeError: expected an object with a buffer interface

Putting each element in a tuple works:

# Manuel way
y = np.array([(1,), (2,), (3,)], dtype=[('y', float)])
# Comprehension
y = np.array([tuple((x,)) for x in [1,2,3]], dtype=[('y', float)])

It also works if I create an array from the list first:

y = np.array(np.array([1,2,3]), dtype=[('y', float)])

I'm a bit puzzled. How come the latter works but numpy couldn't sort things out when provided a simple list?

What is the recommended way? Creating that intermediate array might not have a great performance impact, but isn't this suboptimal?

I'm also surprised that those won't work:

# All lists
y = np.array([[1,], [2,], [3,]], dtype=[('y', float)])
TypeError: expected an object with a buffer interface
# All tuples
y = np.array(((1,), (2,), (3,)), dtype=[('y', float)])
ValueError: size of tuple must match number of fields.

I'm new to structured arrays and I don't remember numpy being that picky about input types. There must be something I'm missing.

Jérôme
  • 13,328
  • 7
  • 56
  • 106
  • Because rows have to be assigned to using tuples, because each element of a structured array is a *struct*, so there is some kind of compound datatype. The alternative is to use a buffer (which is why `np.array` works). – juanpa.arrivillaga Apr 24 '17 at 09:30
  • This is sort of documented [here](https://docs.scipy.org/doc/numpy/user/basics.rec.html#filling-structured-arrays) – juanpa.arrivillaga Apr 24 '17 at 09:32
  • And the previous docs paragraph mentions ` Notice that x is created with a list of tuples.`. This input style matches the display style. I prefer your list comprehension approach. Or filling a preallocated array field by field. – hpaulj Apr 24 '17 at 13:33

2 Answers2

6

Details of how np.array handles various inputs are buried in compiled code. As the many questions about creating object dtype arrays show, it can be complicated and confusing. The basic model is to create multidimensional numeric array from a nested list.

np.array([[1,2,3],[4,5,6]])

In implementing structured arrays, developers adopted the tuple as a way of distinguishing a record from just another nested dimension. That is evident in the display of a structured array.

It is also a requirement when defining a structured array, though the list of tuples requirement is somewhat buried in the documentation.

In [382]: dt=np.dtype([('y',int)])
In [383]: np.array(alist,dt)

TypeError: a bytes-like object is required, not 'int'

This is my version '1.12.0' error message. It appears to be different in yours.

As you note a list comprehension can convert the nest list into a list of tuples.

In [384]: np.array([tuple(i) for i in alist],dt)
Out[384]: 
array([(1,), (2,), (3,)], 
      dtype=[('y', '<i4')])

In answering SO questions that's the approach I use most often. Either that or iteratively set fields of a preallocated array (usually there are a lot more records than fields, so that loop is not expensive).

It looks like wrapping the array in an structured array call is equivalent to an astype call:

In [385]: np.array(np.array(alist),dt)
Out[385]: 
array([[(1,)],
       [(2,)],
       [(3,)]], 
      dtype=[('y', '<i4')])
In [386]: np.array(alist).astype(dt)
Out[386]: 
array([[(1,)],
       [(2,)],
       [(3,)]], 
      dtype=[('y', '<i4')])

But note the change in the number of dimensions. The list of tuples created a (3,) array. The astype converted a (3,1) numeric array into a (3,1) structured array.

Part of what the tuples tell np.array is - put the division between array dimensions and records 'here'. It interprets

[(3,), (1,), (2,)]
[record, record, record]

where as automatic translation of [[1],[2],[3]] might produce

[[record],[record],[record]]

When the dtype is numeric (non-structured) it ignores the distinction between list and tuple

In [388]: np.array([tuple(i) for i in alist],int)
Out[388]: 
array([[1],
       [2],
       [3]])

But when the dtype is compound, developers have chosen to use the tuple layer as significant information.


Consider a more complex structured dtype

In [389]: dt1=np.dtype([('y',int,(2,))])
In [390]: np.ones((3,), dt1)
Out[390]: 
array([([1, 1],), ([1, 1],), ([1, 1],)], 
      dtype=[('y', '<i4', (2,))])
In [391]: np.array([([1,2],),([3,4],)])
Out[391]: 
array([[[1, 2]],

       [[3, 4]]])
In [392]: np.array([([1,2],),([3,4],)], dtype=dt1)
Out[392]: 
array([([1, 2],), ([3, 4],)], 
      dtype=[('y', '<i4', (2,))])

The display (and input) has lists within tuples within list. And that's just the start

In [393]: dt1=np.dtype([('x',dt,(2,))])
In [394]: dt1
Out[394]: dtype([('x', [('y', '<i4')], (2,))])
In [395]: np.ones((2,),dt1)
Out[395]: 
array([([(1,), (1,)],), ([(1,), (1,)],)], 
      dtype=[('x', [('y', '<i4')], (2,))])

convert list of tuples to structured numpy array

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 1
    Thank you for this comprehensive answer. It makes more sense, now. I was a bit surprised, as from my modest experience, numpy was "I'm-a-scientist-with-no-python-background-and-I-can-do-numpy" easy, with few astonishment, and it appears manipulating structured arrays requires much more precautions and understanding of what happens inside. – Jérôme Apr 26 '17 at 08:28
1

np.array() function accepts list of list as input. So if you want to create a 2 * 2 matrix, for example, this is what you need to do

X = np.array([[1,2], [3,4]])
Babatunde Mustapha
  • 2,131
  • 20
  • 21