How numpy implements flatten ? Or generally strided array flattening

Question

I am trying to understand what is happening in python when you perform some operations. For instance, from this reply, I understand how strides are working and how it is important. But now, I would like to know, if after transpose, in the memory, the data haven't been 'physically' transposed, when I am calling .flatten(order="C") after a transpose operation, the data is correctly ordered. Thanks to the strides I know it is definitely possible to implement this operation, unfortunately I can't come up with an algorithm that works for any 'transposed' strides.

import numpy as np

array = np.arange(24).reshape(2, 3, 4)
print(array.flatten(order='C'))
array = array.transpose(1, 0, 2)
print(array.flatten(order='C'))

>>> [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
>>> [ 0  1  2  3 12 13 14 15  4  5  6  7 16 17 18 19  8  9 10 11 20 21 22 23]

score 1 · Answer 1 · answered Mar 29 '23 at 21:24

You can check the strides attribute so to see what is going on in practice:

array = np.arange(24).reshape(2, 3, 4)
print(array.strides)                      # (48, 16, 4)
tmp = array.flatten(order='C')
print(tmp.strides)                        # (4,)
array = array.transpose(1, 0, 2)
print(array.strides)                      # (16, 48, 4)
tmp = array.flatten(order='C')
print(tmp.strides)                        # (4,)

As we can see, the flatten array is always contiguous while the (not-flatten) transposed array is not.

Actually, Numpy tries to never copy data unless you request it to do so (or for basic out-of-place operations). That being said, there are cases like this where Numpy have no choice but creating a copy of the target array. Indeed, the stride is always uniform along a given axis (by design) so a flatten transposed array is necessarily contiguous.

@Chrysophylaxs, I think he's just saying that like the `shape`, there's one `stride` value per axis. A flattened array is 1d, and thus has 1 stride value, the `itemsize`. While `flatten` always makes a copy, `ravel` like `reshape` "tries" not to, but following something like a `transpose`, it too will be a copy. — hpaulj, Mar 30 '23 at 03:05

score 1 · Accepted Answer · answered Mar 29 '23 at 22:38

Calling your transposed array, arrt, we see that:

In [372]: arrt.shape, arrt.strides
Out[372]: ((3, 2, 4), (16, 48, 4))

So the strides, adjusted for itemsize, is (4,12,1).

The ravel/flatten can then be produced with:

In [373]: res = np.zeros(arrt.size, int)
     ...: rcnt = 0
     ...: for i in range(0,3):
     ...:     for j in range(0,2):
     ...:         for k in range(0, 4):
     ...:             res[rcnt] = arrt.base[i*4+j*12+k*1]
     ...:             rcnt += 1
     ...:             

In [374]: res
Out[374]: 
array([ 0,  1,  2,  3, 12, 13, 14, 15,  4,  5,  6,  7, 16, 17, 18, 19,  8,
        9, 10, 11, 20, 21, 22, 23])

arrt.base is the original np.arange. In the inner most loop (k) we are stepping through the base by 1, the j loop steps by 12, and the outer by 4.

The actual compiled code will differ in details, but this gives the general idea of how strides can be used to map from one array shape to another. — hpaulj, Mar 30 '23 at 18:11

How numpy implements flatten ? Or generally strided array flattening

2 Answers2