python - repeating numpy array without replicating data

Question

This question has been asked before, but the solution only works for 1D/2D arrays, and I need a more general answer.

How do you create a repeating array without replicating the data? This strikes me as something of general use, as it would help to vectorize python operations without the memory hit.

More specifically, I have a (y,x) array, which I want to tile multiple times to create a (z,y,x) array. I can do this with numpy.tile(array, (nz,1,1)), but I run out of memory. My specific case has x=1500, y=2000, z=700.

What are you going to do with larger array? `array[None,:,:]` may be just as useful as the tiled array. Unless you do some sort of `dot` product on the y or x dimension, you could still end up with memory error. — hpaulj, May 16 '14 at 21:33
I have to apply a geographical mask to a geophysical dataset in the form (time, y, x). The module I'm using requires that the mask be the same shape as the dataset, which is why I need to replicate the (y,x) mask onto the time dimension. — user3644731, May 19 '14 at 08:51

score 5 · Accepted Answer · edited May 23 '17 at 10:28

5

One simple trick is to use np.broadcast_arrays to broadcast your (x, y) against a z-long vector in the first dimension:

import numpy as np

M = np.arange(1500*2000).reshape(1500, 2000)
z = np.zeros(700)

# broadcasting over the first dimension
_, M_broadcast = np.broadcast_arrays(z[:, None, None], M[None, ...])

print M_broadcast.shape, M_broadcast.flags.owndata
# (700, 1500, 2000), False

To generalize the stride_tricks method given for a 1D array in this answer, you just need to include the shape and stride length for each dimension of your output array:

M_strided = np.lib.stride_tricks.as_strided(
                M,                              # input array
                (700, M.shape[0], M.shape[1]),  # output dimensions
                (0, M.strides[0], M.strides[1]) # stride length in bytes
            )

edited May 23 '17 at 10:28

Community

1
1

answered May 16 '14 at 13:13

ali_m

71,714
23
223
298

The broadcasting thing does exactly what I wanted. It seems to me as simpler/more logical than the stride_tricks method. – user3644731 May 19 '14 at 08:49
Internally `broadcast_arrays` uses `as_strided` in exactly this way. Look in `numpy/lib/stride_tricks.py`. It's the `0` stride length for the first dimension that does the trick. – hpaulj May 19 '14 at 16:20
The `stride length in bytes` line should be `(0, M.strides[0], M.strides[1])` – hpaulj May 19 '14 at 16:39
@hpaulj that's interesting to know, although I'm sure that that using `stride_tricks` directly is still more efficient than allocating another array just to broadcast against. – ali_m May 19 '14 at 16:40
`M[None,:,:]` has `shape: (1,...)` and `strides: (0,..)`. Same strides, but just a `1` in the new shape dimension. – hpaulj May 20 '14 at 02:04
@ali_m How would you perform further calculations on the broadcast array? even doing an inplace calculation would result in a new array, wouldn't it? for example `np.abs` – Gulzar Oct 25 '20 at 15:47
@ali_m I would like to for example reduce the broadcast result over an axis, after some calculation. How would I do that? – Gulzar Oct 25 '20 at 16:30

python - repeating numpy array without replicating data

1 Answers1