4

From what I understand, the recommended way to convert a NumPy array into a native Python list is to use ndarray.tolist.

Alas, this doesn't seem to work recursively when using structured arrays. Indeed, some ndarray objects are being referenced in the resulting list, unconverted:

>>> dtype = numpy.dtype([('position', numpy.int32, 3)])
>>> values = [([1, 2, 3],)]
>>> a = numpy.array(values, dtype=dtype)
>>> a.tolist()
[(array([1, 2, 3], dtype=int32),)]

I did write a simple function to workaround this issue:

def array_to_list(array):
    if isinstance(array, numpy.ndarray):
        return array_to_list(array.tolist())
    elif isinstance(array, list):
        return [array_to_list(item) for item in array]
    elif isinstance(array, tuple):
        return tuple(array_to_list(item) for item in array)
    else:
        return array

Which, when used, provides the expected result:

>>> array_to_list(a) == values
True

The problem with this function is that it duplicates the job of ndarray.tolist by recreating each list/tuple that it outputs. Not optimal.

So the questions are:

  • is this behaviour of ndarray.tolist to be expected?
  • is there a better way to make this happen?
ChristopherC
  • 1,635
  • 16
  • 31
  • A structured array makes more sense to me as a dict with list values (or vice versa), than a list of lists. –  Sep 15 '16 at 03:19
  • 1
    Dict definitely makes sense indeed but so does list of lists because the fields of a structured array are defined within the dtype as an ordered list. On top of that, NumPy understands the `values` variable which is used to initialize the array, even though it is defined in a non-dict format, so list of lists is definitely a valid structure here. – ChristopherC Sep 15 '16 at 03:34

1 Answers1

1

Just to generalize this a bit, I'll add an another field to your dtype

In [234]: dt = numpy.dtype([('position', numpy.int32, 3),('id','U3')])

In [235]: a=np.ones((3,),dtype=dt)

The repr display does use lists and tuples:

In [236]: a
Out[236]: 
array([([1, 1, 1], '1'), ([1, 1, 1], '1'), ([1, 1, 1], '1')], 
  dtype=[('position', '<i4', (3,)), ('id', '<U3')])

but as you note, tolist does not expand the elements.

In [237]: a.tolist()
Out[237]: [(array([1, 1, 1]), '1'), (array([1, 1, 1]), '1'), 
   (array([1, 1, 1]), '1')]

Similarly, such an array can be created from the fully nested lists and tuples.

In [238]: a=np.array([([1,2,3],'str')],dtype=dt)
In [239]: a
Out[239]: 
array([([1, 2, 3], 'str')], 
  dtype=[('position', '<i4', (3,)), ('id', '<U3')])
In [240]: a.tolist()
Out[240]: [(array([1, 2, 3]), 'str')]

There's no problem recreating the array from this incomplete recursion:

In [250]: np.array(a.tolist(),dtype=dt)
Out[250]: 
array([([1, 2, 3], 'str')], 
      dtype=[('position', '<i4', (3,)), ('id', '<U3')])

This is the first that I've seen anyone use tolist with a structured array like this, but I'm not too surprised. I don't know if developers would consider this a bug or not.

Why do you need a pure list/tuple rendering of this array?

I wonder if there's a function in numpy/lib/recfunctions.py that addresses this.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 1
    I stumbled upon this issue when writing unit tests and wanting to compare the content of an array with a corresponding native Python list representing the expected values. I know there are possibly other ways to go about this but I liked the idea of converting the data from NumPy to Python so I could use the `assertIsEqual` test, with its implementation showing the diff when not equal. At the end of the day, `ndarray.tolist` seems to be a valid approach to serialization, so I find it surprising to not work here. Maybe I'll file an issue on the NumPy's repo, to see what they say about it. – ChristopherC Sep 15 '16 at 04:17