Splitting multidimensional array in Numpy

Question

I'm trying to split a multidimensional array (array)

import numpy as np

shape = (3, 4, 4, 2)
array = np.random.randint(0,10,shape)

into an array (new_array) with shape (3,2,2,2,2,2) where the dimension 1 has been split into 2 (dimension 1 and 2) and dimension 2 in array has been split into 2 (dimensions 3 and 4).

So far I got a working method which is:

div_x = 2
div_y = 2
new_dim_x = shape[1]//div_x
new_dim_y = shape[2]//div_y

new_array_split = np.array([np.split(each_sub, axis=2, indices_or_sections=div_y) for each_sub in np.split(array[:, :(new_dim_x*div_x), :(new_dim_y*div_y)], axis=1, indices_or_sections=div_x)])

I'm also looking into using reshape:

new_array_reshape = array[:, :(div_x*new_dim_x), :(div_y*new_dim_y), ...].reshape(shape[0], div_x, div_y, new_dim_x, new_dim_y, shape[-1]).transpose(1,2,0,3,4,5)

The reshape method is faster than the split method:

%timeit array[:, :(div_x*new_dim_x), :(div_y*new_dim_y), ...].reshape(shape[0], div_x, div_y, new_dim_x, new_dim_y, shape[-1]).transpose(1,2,0,3,4,5)
2.16 µs ± 44.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.array([np.split(each_sub, axis=2, indices_or_sections=div_y) for each_sub in np.split(array[:, :(new_dim_x*div_x), :(new_dim_y*div_y)], axis=1, indices_or_sections=div_x)])
58.3 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

However, I cannot get the same results, because of the last dimension:

print('Reshape method')
print(new_array_reshape[1,0,0,...])
print('\nSplit method')
print(new_array_split[1,0,0,...])
 
Reshape method
[[[2 2]
  [4 3]]
 [[3 5]
  [5 9]]]

Split method
[[[2 2]
  [4 3]]
 [[5 3]
  [9 8]]]

The split method does exactly what I want, I did check number by number and it does the type of split I want, but not at the speed I would like.

QUESTION

Is there a way to achieve the same results as the split method, using reshape or any other approach?

CONTEXT

The array is actually data flow from image processing, where the first dimension of array is the time, the second dimension is coordinate x (4), the third dimension is coordinate y (4) and the fourth dimension (2) is the Magnitude and phase of the flow.

I would like to split the images (coordinate x and y) into subimages making an array of pictures of 2x2 so I can analyse the flow more locally, perform averages, clustering, etc.

This process (splitting) is going to be performed many times that is why I'm looking for an optimal and efficient solution. I believe the way is probably using reshape, but I'm open to any other option.

Sounds almost like you even want a (nxn) window function over the middle two dimensions? That would allow you to more easily do spatial averages and clustering. And would allow you to "split" by any shape of window — Daniel F, Nov 04 '20 at 10:47
Yep, that is what I want. Check Divakar solution if you wish! — Ger, Nov 04 '20 at 10:55

score 1 · Accepted Answer · answered Nov 04 '20 at 10:43

1

Reshape and permute axes -

array.reshape(3,2,2,2,2,2).transpose(1,3,0,2,4,5)

answered Nov 04 '20 at 10:43

Divakar

218,885
19
262
358

1

I was almost there!! my code is `array.reshape(3,2,2,2,2,2)..transpose(1,2,0,3,4,5)` so it seems the new dimension is right after the split dimension... which makes sense... I don't know why I didn't think on that before. Thank you very much! – Ger Nov 04 '20 at 10:55

Daniel F · Answer 2 · 2020-11-04T11:08:28.453

For your use case I'm not sure reshape is the best option. If you want to be able to locally average and cluster, you might want a window function:

from skimage.util import view_as_windows

def window_over(arr, size = 2, step = 2, axes = (1, 2) ):
    wshp = list(arr.shape)
    for a in axes:
        wshp[a] = size
    return view_as_windows(arr, wshp, step).squeeze()

window_over(test).shape
Out[]: (2, 2, 3, 2, 2, 2)

Your output axes can then be rearranged how you want using transpose. The benefit of this is that you can get the intermediate windows:

window_over(test, step = 1).shape
Out[]: (3, 3, 3, 2, 2, 2)

That includes the 2x2 windows that overlap, so you get 3x3 results.

Since overlapping is possible, you also don't need your windows to be divisible by the dimension size:

window_over(test, size = 3).shape
Out[]: (2, 2, 3, 3, 3, 2)

Hi Daniel F. Can you extend on why should I not use reshape? I guess using a window function you can just get views on the array, without modifying it. Is it that what you mean? — Ger, Nov 04 '20 at 11:52
Also very nice answer! I did have a look at the function `view_as_windows` but I think I didn't get its usefulness quite right. — Ger, Nov 04 '20 at 11:54

Splitting multidimensional array in Numpy

2 Answers2