-1

I would like to convert an array with many dimensions (more than 2) into a 2D array where other dimensions would be converted to nested stand-alone arrays.

So if I have an array like numpy.arange(3 * 4 * 5 * 5 * 5).reshape((3, 4, 5, 5, 5)), I would like to convert it to an array of shape (3, 4), where each element would be an array of shape (5, 5, 5). The dtype of the outer array would be object.

For example, for np.arange(8).reshape((1, 1, 2, 2, 2)), the output would be equivalent to:

a = np.ndarray(shape=(1,1), dtype=object)
a[0, 0] = np.arange(8).reshape((1, 1, 2, 2, 2))[0, 0, :, :, :]

How can I do this efficiently?

Mitar
  • 6,756
  • 5
  • 54
  • 86
  • Could please show an example code where you show what you have already tried? – JE_Muc May 29 '18 at 10:20
  • Can you show the expected output with, lets's say, `np.arange(8).reshape((1, 1, 2, 2, 2))`? Your example is unnecessarily large. – zipa May 29 '18 at 10:24
  • Added expected output example. – Mitar May 29 '18 at 10:35
  • A similar SO question, with a clever (though slower) answer using `frompyfunc`, [Force numpy to create array of objects](https://stackoverflow.com/questions/49064548/force-numpy-to-create-array-of-objects/49104269#49104269) – hpaulj May 29 '18 at 18:37

2 Answers2

1

We can reshape and assign elements from the regular array into the output object dtype array in a single loop that seems to be a tad faster than with two loops, like so -

def reshape_approach(a):
    m,n = a.shape[:2]
    a.shape = (m*n,) + a.shape[2:]
    out = np.empty((m*n),dtype=object)
    for i in range(m*n):
        out[i] = a[i]
    out.shape = (m,n)
    a.shape = (m,n) + a.shape[1:]
    return out

Runtime test

Other approach(es) -

# @Scotty1-'s soln
def simply_assign(a):
    m,n = a.shape[:2]
    out = np.empty((m,n),dtype=object)
    for i in range(m):
        for j in range(n):
            out[i,j] = a[i,j]
    return out

Timings -

In [154]: m,n = 300,400
     ...: a = np.arange(m * n * 5 * 5 * 5).reshape((m,n, 5, 5, 5))

In [155]: %timeit simply_assign(a)
10 loops, best of 3: 39.4 ms per loop

In [156]: %timeit reshape_approach(a)
10 loops, best of 3: 32.9 ms per loop

With 7D data -

In [160]: m,n,p,q = 30,40,30,40
     ...: a = np.arange(m * n *p * q * 5 * 5 * 5).reshape((m,n,p,q, 5, 5, 5))

In [161]: %timeit simply_assign(a)
1000 loops, best of 3: 421 µs per loop

In [162]: %timeit reshape_approach(a)
1000 loops, best of 3: 316 µs per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Good solution. but imho too complicated for a simple task, if efficiency is not essential for survival. What could be even faster is doing `x.reshape(-1)` and then use strides to avoid loops completely. I'll take a look into this later on. – JE_Muc May 29 '18 at 10:59
  • 1
    @Scotty1- Well OP is asking for performance, so I don't see why few extra steps would hurt :) – Divakar May 29 '18 at 11:01
  • The reshape loop could be replaced by `out[:] = list(a)`, though the timing is essentially the same. – hpaulj May 29 '18 at 15:33
  • @hpaulj Yup, tried that, but that's noticeably slower. – Divakar May 29 '18 at 15:33
  • @Mitar What do you mean assign to shape? – Divakar May 29 '18 at 16:52
  • I didn't know that you can do `out.shape = (m,n)`. I always thought you have to call `reshape`. – Mitar May 30 '18 at 01:49
  • @Mitar Yeah, that's just a shorter alternative to reshape arrays in-place. – Divakar May 30 '18 at 06:41
0

Thanks for your hint Mitar. This is how it should look like using dtype=np.object arrays:

outer_array = np.empty((x.shape[0], x.shape[1]), dtype=np.object)
for i in range(x.shape[0]):
    for j in range(x.shape[1]):
        outer_array[i, j] = x[i, j]

Looping may not be the most efficient way to do it, but there is afaik no vectorized operation for this task.

(Using some more reshaping, this should be even faster than Divakar's solution: ;)) ---> No, Divakar is faster.... Nice solution Divakar!

def advanced_reshape_solution(x):
    m, n = x.shape[:2]
    sub_arr_size = np.prod(x.shape[2:])
    out_array = np.empty((m * n), dtype=object)
    x_flat_view = x.reshape(-1)
    for i in range(m*n):
        out_array[i] = x_flat_view[i * sub_arr_size:(i + 1) * sub_arr_size].reshape(x.shape[2:])
    return out_array.reshape((m, n))
JE_Muc
  • 5,403
  • 2
  • 26
  • 41