I'm trying to split a multidimensional array (array
)
import numpy as np
shape = (3, 4, 4, 2)
array = np.random.randint(0,10,shape)
into an array (new_array
) with shape (3,2,2,2,2,2)
where the dimension 1 has been split into 2 (dimension 1 and 2) and dimension 2 in array
has been split into 2 (dimensions 3 and 4).
So far I got a working method which is:
div_x = 2
div_y = 2
new_dim_x = shape[1]//div_x
new_dim_y = shape[2]//div_y
new_array_split = np.array([np.split(each_sub, axis=2, indices_or_sections=div_y) for each_sub in np.split(array[:, :(new_dim_x*div_x), :(new_dim_y*div_y)], axis=1, indices_or_sections=div_x)])
I'm also looking into using reshape
:
new_array_reshape = array[:, :(div_x*new_dim_x), :(div_y*new_dim_y), ...].reshape(shape[0], div_x, div_y, new_dim_x, new_dim_y, shape[-1]).transpose(1,2,0,3,4,5)
The reshape
method is faster than the split
method:
%timeit array[:, :(div_x*new_dim_x), :(div_y*new_dim_y), ...].reshape(shape[0], div_x, div_y, new_dim_x, new_dim_y, shape[-1]).transpose(1,2,0,3,4,5)
2.16 µs ± 44.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.array([np.split(each_sub, axis=2, indices_or_sections=div_y) for each_sub in np.split(array[:, :(new_dim_x*div_x), :(new_dim_y*div_y)], axis=1, indices_or_sections=div_x)])
58.3 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
However, I cannot get the same results, because of the last dimension:
print('Reshape method')
print(new_array_reshape[1,0,0,...])
print('\nSplit method')
print(new_array_split[1,0,0,...])
Reshape method
[[[2 2]
[4 3]]
[[3 5]
[5 9]]]
Split method
[[[2 2]
[4 3]]
[[5 3]
[9 8]]]
The split method does exactly what I want, I did check number by number and it does the type of split I want, but not at the speed I would like.
QUESTION
Is there a way to achieve the same results as the split method, using reshape or any other approach?
CONTEXT
The array is actually data flow from image processing, where the first dimension of array
is the time, the second dimension is coordinate x (4), the third dimension is coordinate y (4) and the fourth dimension (2) is the Magnitude and phase of the flow.
I would like to split the images (coordinate x and y) into subimages making an array of pictures of 2x2 so I can analyse the flow more locally, perform averages, clustering, etc.
This process (splitting) is going to be performed many times that is why I'm looking for an optimal and efficient solution. I believe the way is probably using reshape
, but I'm open to any other option.