4

From a 3D array with the shape (M, N, P) of data type int, I would like to get a 2D array of shape (N, P) of data type object and have this done with reasonable efficiency.

I'm happy with the objects being of either tuple, list or numpy.ndarray types.

I have a working hack of a solution where I have to go via a list. So it feels like I'm missing something:

import numpy as np

m = np.mgrid[:8, :12]

l = zip(*(v.ravel() for v in m))
a2 = np.empty(m.shape[1:], dtype=np.object)
a2.ravel()[:] = l

The final array a2, in this example, should have the property that a2[(x, y)] == (x, y)

It feels like it should have been possible to transpose m and make a2 like this:

a2 = m.transpose(1,2,0).astype(np.object).reshape(m.shape[1:])

since numpy doesn't really care about what's inside the objects or alternatively when creating a numpy-array of type np.object be able to tell how many dimensions there should be:

a2 = np.array(m.transpose(1,2,0), astype=object, ndim=2)

Numpy knows to stop before the final depth of nested iterables if they have different shape at the third dimension (in this example), but since m doesn't have irregularities, this seems impossible.

Or create a2 and fill it with the transposed:

a2 = np.empty(m.shape[1:], dtype=np.object)
a2[...] = m.transpose(1, 2, 0)

In this case e.g. m.transpose(1, 2, 0)[2, 4] is np.array([2, 4]) and assigning it to a2[2, 4] would have been perfectly legal. However, none of these three more reasonable attempts work.

deinonychusaur
  • 7,094
  • 3
  • 30
  • 44

1 Answers1

2

So for a smaller m:

In [513]: m = np.mgrid[:3,:4]
In [514]: m.shape
Out[514]: (2, 3, 4)
In [515]: m
Out[515]: 
array([[[0, 0, 0, 0],
        [1, 1, 1, 1],
        [2, 2, 2, 2]],

       [[0, 1, 2, 3],
        [0, 1, 2, 3],
        [0, 1, 2, 3]]])
In [516]: ll = list(zip(*(v.ravel() for v in m)))
In [517]: ll
Out[517]: 
[(0, 0),
 (0, 1),
 (0, 2),
 ...
 (2, 3)]
In [518]: a2=np.empty(m.shape[1:], dtype=object)
In [519]: a2.ravel()[:] = ll
In [520]: a2
Out[520]: 
array([[(0, 0), (0, 1), (0, 2), (0, 3)],
       [(1, 0), (1, 1), (1, 2), (1, 3)],
       [(2, 0), (2, 1), (2, 2), (2, 3)]], dtype=object)

Making an empty of the right shape, and filling it via [:]= is the best way of controlling the object depth of such an array. np.array(...) defaults to the highest possible dimension, which in this case would 3d.

So the main question is - is there a better way of constructing that ll list of tuples.

 a2.ravel()[:] = np.array(ll)

does not work, complaining (12,2) into shape (12).

Working backwards, if I start with an array like ll, turn it into a nested list, the assignment works, except elements of a2 are lists, not tuples:

In [533]: a2.ravel()[:] = np.array(ll).tolist()
In [534]: a2
Out[534]: 
array([[[0, 0], [0, 1], [0, 2], [0, 3]],
       [[1, 0], [1, 1], [1, 2], [1, 3]],
       [[2, 0], [2, 1], [2, 2], [2, 3]]], dtype=object)

m shape is (2,3,4)andnp.array(ll)shape is (12,2), thenm.reshape(2,-1).T` produces the same thing.

a2.ravel()[:] = m.reshape(2,-1).T.tolist()

I could have transposed first, and then reshaped, m.transpose(1,2,0).reshape(-1,2).

To get tuples I need to pass the reshaped array through a comprehension:

a2.ravel()[:] = [tuple(l) for l in m.reshape(2,-1).T]

===============

m.transpose(1,2,0).astype(object) is still 3d; it's just changed the integers with pointers to integers. There's a 'wall' between the array dimensions and the dtype. Things like reshape and transpose only operate on the dimensions, and don't penetrate that wall, or move it. Lists are pointers all the way down. Object arrays use pointers only at the dtype level.

Don't be afraid of the a2.ravel()[:]= expression. ravel is a cheap reshape, and assignment to a flatten version of an array may actually be faster than assignment to 2d version. After all, the data (in this case pointers) is stored in a flat data buffer.

But (after playing around a bit) I can do the assignment without the ravel or reshape (still need the tolist to move the object boundary). The list nesting has to match the a2 shape down to 'object' level.

a2[...] = m.transpose(1,2,0).tolist()   # even a2[:] works

(This brings to mind a discussion about giving np.array a maxdim parameter - Prevent numpy from creating a multidimensional array).

The use of tolist seems like an inefficiency. But if the elements of a2 are tuples (or rather pointers to tuples), those tuples have to be created some how. The c databuffer of the m cannot be viewed as a set of tuples. tolist (with the [tuple...] comprehension) might well be the most efficient way of creating such objects.

==============

Did I note that the transpose can be indexed, producing 2 element arrays with the right numbers?

In [592]: m.transpose(1,2,0)[1,2]
Out[592]: array([1, 2])
In [593]: m.transpose(1,2,0)[0,1]
Out[593]: array([0, 1])

==================

Since the tolist for a structured array uses tuples, I could do:

In [598]: a2[:]=m.transpose(1,2,0).copy().view('i,i').reshape(a2.shape).tolist()

In [599]: a2
Out[599]: 
array([[(0, 0), (0, 1), (0, 2), (0, 3)],
       [(1, 0), (1, 1), (1, 2), (1, 3)],
       [(2, 0), (2, 1), (2, 2), (2, 3)]], dtype=object)

and thus avoid the list comprehension. It's not necessarily simpler or faster.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353