1

There is a bunch of questions regarding reshaping of matrices using NumPy here on stackoverflow. I have found one that is closely related to what I am trying to achieve. However, this answer is not general enough for my application. So here we are.

I have got a matrix with millions of lines (shape m x n) that looks like this:

[[0, 0, 0, 0],
 [1, 1, 1, 1],
 [2, 2, 2, 2],
 [3, 3, 3, 3],
 [4, 4, 4, 4],
 [5, 5, 5, 5],
 [6, 6, 6, 6],
 [7, 7, 7, 7],
 [...]]

From this I would like to go to a shape m/2 x 2n like it can be seen below. For that one has to take n consecutive rows every n rows (in this example n = 2). The blocks of consecutively taken rows are then horizontally stacked to the untouched rows. In this example that would mean:

  1. The first two rows stay like they are.
  2. Take row two and three and horizontally concatenate them to row zero and one.
  3. Take row six and seven and horizontally concatenate them to row four and five. This concatenated block then becomes row two and three.
  4. ...
[[0, 0, 0, 0, 2, 2, 2, 2],
 [1, 1, 1, 1, 3, 3, 3, 3],
 [4, 4, 4, 4, 6, 6, 6, 6],
 [5, 5, 5, 5, 7, 7, 7, 7],
 [...]]

How would I most efficiently (in terms of the least computation time possible) do that using Numpy? And would it make sense to speed the process up using Numba? Or is there not much to speed up?

paweller
  • 208
  • 2
  • 9
  • Please expand on how the second array is computed from the 1st, and also what's its shap – FBruzzesi May 27 '21 at 17:01
  • What are you doing now that isn't efficient? This is more of a debugging help-line than a code writing/guessing one! – hpaulj May 27 '21 at 17:33
  • @FBruzzesi: I edited my question to hopefully provide all the details you asked for. – paweller May 27 '21 at 18:03
  • @hpaulj: As of right now I am not doing anything. That's why I ask how to do it. I could think of a way including a bunch of for loops. But I thought there has to be a more elegant and possibly computationally more efficient way using [Numpy's array manipulation routines](https://numpy.org/doc/stable/reference/routines.array-manipulation.html). Sorry for the unclear code snippets in the first place. I hope the updated ones are easier to understand. – paweller May 27 '21 at 18:03

2 Answers2

1

Assuming your array's length is divisible by 4, here one way you can do it using numpy.hstack after creating the correct indices for selecting the rows for the "left" and "right" parts of the resulting array:

import numpy 
# Create the array
N = 1000*4
a = np.hstack([np.arange(0, N)[:, None]]*4) #shape (4000, 4)
a
array([[   0,    0,    0,    0],
       [   1,    1,    1,    1],
       [   2,    2,    2,    2],
       ...,
       [3997, 3997, 3997, 3997],
       [3998, 3998, 3998, 3998],
       [3999, 3999, 3999, 3999]])

left_idx = np.array([np.array([0,1]) + 4*i for i in range(N//4)]).reshape(-1)
right_idx = np.array([np.array([2,3]) + 4*i for i in range(N//4)]).reshape(-1)

r = np.hstack([a[left_idx], a[right_idx]]) #shape (2000, 8)
r
array([[   0,    0,    0, ...,    2,    2,    2],
       [   1,    1,    1, ...,    3,    3,    3],
       [   4,    4,    4, ...,    6,    6,    6],
       ...,
       [3993, 3993, 3993, ..., 3995, 3995, 3995],
       [3996, 3996, 3996, ..., 3998, 3998, 3998],
       [3997, 3997, 3997, ..., 3999, 3999, 3999]])
FBruzzesi
  • 6,385
  • 3
  • 15
  • 37
  • Thanks, that works. However, it feels somewhat slow. Probably that is due to the `r = np.hstack(...)` which allocates new space in memory. Do you happen to know whether there is a way to achieve the same by only changing the `view` of the matrix? That would most likely speed up things quite a bit. – paweller May 28 '21 at 07:10
1

Here's an application of the swapaxes answer in your link.

In [11]: x=np.array([[0, 0, 0, 0],
    ...:  [1, 1, 1, 1],
    ...:  [2, 2, 2, 2],
    ...:  [3, 3, 3, 3],
    ...:  [4, 4, 4, 4],
    ...:  [5, 5, 5, 5],
    ...:  [6, 6, 6, 6],
    ...:  [7, 7, 7, 7]])

break the array into 'groups' with a reshape, keeping the number of columns (4) unchanged.

In [17]: x.reshape(2,2,2,4)
Out[17]: 
array([[[[0, 0, 0, 0],
         [1, 1, 1, 1]],

        [[2, 2, 2, 2],
         [3, 3, 3, 3]]],


       [[[4, 4, 4, 4],
         [5, 5, 5, 5]],

        [[6, 6, 6, 6],
         [7, 7, 7, 7]]]])

swap the 2 middle dimensions, regrouping rows:

In [18]: x.reshape(2,2,2,4).transpose(0,2,1,3)
Out[18]: 
array([[[[0, 0, 0, 0],
         [2, 2, 2, 2]],

        [[1, 1, 1, 1],
         [3, 3, 3, 3]]],


       [[[4, 4, 4, 4],
         [6, 6, 6, 6]],

        [[5, 5, 5, 5],
         [7, 7, 7, 7]]]])

Then back to the target shape. This final step creates a copy of the original (the previous steps were view):

In [19]: x.reshape(2,2,2,4).transpose(0,2,1,3).reshape(4,8)
Out[19]: 
array([[0, 0, 0, 0, 2, 2, 2, 2],
       [1, 1, 1, 1, 3, 3, 3, 3],
       [4, 4, 4, 4, 6, 6, 6, 6],
       [5, 5, 5, 5, 7, 7, 7, 7]])

It's hard to generalize this, since there are different ways of rearranging blocks. For example my first try produced:

In [16]: x.reshape(4,2,4).transpose(1,0,2).reshape(4,8)
Out[16]: 
array([[0, 0, 0, 0, 2, 2, 2, 2],
       [4, 4, 4, 4, 6, 6, 6, 6],
       [1, 1, 1, 1, 3, 3, 3, 3],
       [5, 5, 5, 5, 7, 7, 7, 7]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353