25

I'd like to create a 1D NumPy array that would consist of 1000 back-to-back repetitions of another 1D array, without replicating the data 1000 times.

Is it possible?

If it helps, I intend to treat both arrays as immutable.

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 2
    I came across this question after trying to search for large-data manipulation in Python. I read about Strides and was wondering why would need a replicated data which is essentially the same (points to same data in the memory). You can read read from single data set twice, can't you? I just want to know the reason you are doing this replication. Thanks. –  Dec 23 '11 at 19:12
  • I feel like a lot of times you might want to do this, what you *really* want to do is use broadcasting. – endolith May 31 '22 at 14:21

5 Answers5

28

You can't do this; a NumPy array must have a consistent stride along each dimension, while your strides would need to go one way most of the time but sometimes jump backwards.

The closest you can get is either a 1000-row 2D array where every row is a view of your first array, or a flatiter object, which behaves kind of like a 1D array. (flatiters support iteration and indexing, but you can't take views of them; all indexing makes a copy.)

Setup:

import numpy as np
a = np.arange(10)

2D view:

b = np.lib.stride_tricks.as_strided(a, (1000, a.size), (0, a.itemsize))

flatiter object:

c = b.flat
user2357112
  • 260,549
  • 28
  • 431
  • 505
Paul
  • 42,322
  • 15
  • 106
  • 123
  • Cool, I was wondering if strides could be used, but I couldn't figure out how! `b.flat` or `b.flatten()`? – Benjamin Apr 06 '11 at 14:57
  • 2
    `b.flat.base is b` is True; `b.flatten().base is b` is False, so you want `b.flat` – JoshAdel Apr 06 '11 at 15:04
  • Not sure what that means. `b.flatten().base` returns nothing... `(b.flat == b.flatten()).all()` is True, so what is the difference? – Benjamin Apr 06 '11 at 15:15
  • see http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.base.html. The difference is that your comparison tests if the values are the same on an element-wise basis. `.base` tells you about ownership of data. – JoshAdel Apr 06 '11 at 15:21
  • @JoshAdel: Thanks, that is useful to know. – Benjamin Apr 06 '11 at 15:32
19

broadcast_to was added in numpy 1.10, which allows you effectively repeat an array with a little less effort.

Copying the style of the accepted answer:

import numpy as np
arr = np.arange(10)
repeated = np.broadcast_to(arr, (1000, arr.size))
CharlesB
  • 86,532
  • 28
  • 194
  • 218
Erik
  • 6,470
  • 5
  • 36
  • 37
2

I'm not 100% sure what you mean by 'not replicating the data 1000 times'. If you are looking for a numpy method to build b from a in one fell swoop (rather than looping), you can use:

a = np.arange(1000)
b = np.tile(a,1000)

Otherwise, I would do something like:

a = np.arange(1000)
ii = [700,2000,10000] # The indices you want of the tiled array
b = a[np.mod(ii,a.size)]

b is not a view of a in this case because of the fancy indexing (it makes a copy), but at least it returns a numpy array and doesn't create the 1000*1000x1 array in memory and just contains the elements you want.

As far as them being immutable (see Immutable numpy array?), you would need to switch the flag for each separately since copies don't retain the flag setting.

Community
  • 1
  • 1
JoshAdel
  • 66,734
  • 27
  • 141
  • 140
0

I do not claim that this is the most elegant solution, because you have to fool numpy into creating an array of objects (see the line with the comment)

from numpy import array

n = 3

a = array([1,2])
a.setflags(write=False)
t = [a]*n + [array([1])] # Append spurious array that is not len(a)
r = array(t,dtype=object)
r.setflags(write=False)

assert id(a) == id(t[1]) == id(r[1])
lafras
  • 8,712
  • 4
  • 29
  • 28
-1

Would this work:

import numpy
a = numpy.array([1, 2, 3, 4])
b = numpy.ones((1000, a.shape[0]))
b *= a
b = b.flatten()
Benjamin
  • 11,560
  • 13
  • 70
  • 119
  • 2
    This seems like a very expensive way of doing things if you are going to produce a copy, and is ~15x slower than just using `np.tile` – JoshAdel Apr 06 '11 at 15:08