NumPy array initialization (fill with identical values)

Question

I need to create a NumPy array of length n, each element of which is v.

Is there anything better than:

a = empty(n)
for i in range(n):
    a[i] = v

I know zeros and ones would work for v = 0, 1. I could use v * ones(n), but it ~~won't work when v is None, and also~~ would be much slower.

On my computer, for the 0 case, using `a = np.zeros(n)` in the loop is faster than `a.fill(0)`. This is counter to what I expected since I thought `a=np.zeros(n)` would need to allocate and initialize new memory. If anyone can explain this, I would appreciate it. — user3731622, Sep 27 '16 at 23:40
You cannot put Nonein a numpy array, since the cells are created with a specific data type while None has it's own type and is in fact a pointer. — Camion, May 15 '19 at 18:07
@Camion Yeah I know now :) Of course `v * ones(n)` is still horrible, as it uses the expensive multiplication. Replace `*` with `+` though, and `v + zeros(n)` turns out to be surprisingly good in some cases (https://stackoverflow.com/questions/5891410/numpy-array-initialization-fill-with-identical-values?noredirect=1#comment87321908_20606278). — max, May 19 '19 at 20:15
max, instead of creating an array with zeros before adding v, it is even faster to create it empty with `var = np.empty(n)` and then to fill it with 'var[:] = v'. (btw, `np.full()` is as fast as this) — Camion, May 19 '19 at 21:02

score 527 · Accepted Answer · edited Apr 05 '16 at 15:03

527

NumPy 1.8 introduced np.full(), which is a more direct method than empty() followed by fill() for creating an array filled with a certain value:

>>> np.full((3, 5), 7)
array([[ 7.,  7.,  7.,  7.,  7.],
       [ 7.,  7.,  7.,  7.,  7.],
       [ 7.,  7.,  7.,  7.,  7.]])

>>> np.full((3, 5), 7, dtype=int)
array([[7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7]])

This is arguably the way of creating an array filled with certain values, because it explicitly describes what is being achieved (and it can in principle be very efficient since it performs a very specific task).

edited Apr 05 '16 at 15:03

MSeifert

145,886
38
333
352

answered Dec 16 '13 at 08:24

Eric O. Lebigot

91,433
48
218
260

1

This full() method is working well for me but I can't find a bit of documentation for it. Can anyone point me to the right place? – James Adams Jan 17 '14 at 16:39
1

You can at least do `help(numpy.full)` in a Python shell. I am also surprised that it is not in the web documentation. – Eric O. Lebigot Jan 22 '14 at 13:49
On my system (Python 2.7, Numpy 1.8), np.full() is actually slightly slower than np.empty() followed by np.fill(). – John Zwinck Jul 25 '14 at 08:37
2

For 10,000 elements, I observe the same thing (except that `np.fill()` does not exist and should be `arr.fill()`), with a difference of about 10 %. If the difference was bigger, I would raise an issue in the NumPy bug tracker. :) I prefer more explicit and clearer code, for such a small difference in executing time, so I go with `np.full()` all the time. – Eric O. Lebigot Jul 26 '14 at 08:03
On my machine np.full() is same speed as np.array.fill() – Fnord Apr 20 '16 at 06:15
While logically `np.full()` is the way to go, I observed under some circumstances that `np.zeros() + val` can be faster (e.g. for very large arrays in a HPC). – norok2 May 03 '18 at 09:58

score 113 · Answer 2 · edited Jun 10 '21 at 17:16

113

Updated for Numpy 1.7.0:(Hat-tip to @Rolf Bartstra.)

a=np.empty(n); a.fill(5) is fastest.

In descending speed order:

%timeit a=np.empty(10000); a.fill(5)
100000 loops, best of 3: 5.85 us per loop

%timeit a=np.empty(10000); a[:]=5 
100000 loops, best of 3: 7.15 us per loop

%timeit a=np.ones(10000)*5
10000 loops, best of 3: 22.9 us per loop

%timeit a=np.repeat(5,(10000))
10000 loops, best of 3: 81.7 us per loop

%timeit a=np.tile(5,[10000])
10000 loops, best of 3: 82.9 us per loop

edited Jun 10 '21 at 17:16

Chris

852
1
8
19

answered Oct 24 '12 at 15:19

Yariv

12,945
19
54
75

17

Adding a timing for the more recent and direct `np.full()` would be useful. On my machine, with NumPy 1.8.1, it is about 15% slower than the less direct `fill()` version (which is unexpected, as `full()` has the potential of going slightly faster). – Eric O. Lebigot May 14 '14 at 06:44
@DavidSanders: I am not sure I am following you: `fill()` is the fastest solution. The multiplication solution is much slower. – Eric O. Lebigot Jun 23 '15 at 03:22
2

Note: if speed is really a concern, using a size of `10000` instead of `1e4` makes a noticeable difference, for some reason (`full()` is almost 50% slower, with `1e4`). – Eric O. Lebigot Jun 23 '15 at 03:27
Just adding my results with `full()`, it runs considerably slower when the datatype isn't explicitly a float. Otherwise, it's comparable (but slightly slower) with the best methods here. – user2699 Jul 26 '16 at 23:51
@user2699 I am not observing this, with 100,000 elements: `full(100000, 5)`, `full(100000, 5, dtype=float)`, `full(100000, 5, dtype=int)` and `a =np.empty(100000); a.fill(5)` all take about the same time on my machine (with no caching: `%timeit -r1 -n1 …`) (NumPy 1.11.2). – Eric O. Lebigot Oct 08 '16 at 09:40
I wonder if CPU caching and pipelining might be affecting these timing outcomes. Especially with the diffs are microseconds. A good test would be to run it locally and lets say on colab.google.com or something. At least then you'd be able to see if local caching were negligible compared to cloud archiecture – CodeSamurai-like Jan 01 '22 at 17:20

Paul · Answer 3 · 2016-04-05T16:10:46.933

76

I believe fill is the fastest way to do this.

a = np.empty(10)
a.fill(7)

You should also always avoid iterating like you are doing in your example. A simple a[:] = v will accomplish what your iteration does using numpy broadcasting.

edited Apr 05 '16 at 16:10

answered May 05 '11 at 00:40

Paul

42,322
15
106
123

1

Thank you. In looking at `fill`, I saw that `repeat` suits my needs even better. – max May 05 '11 at 00:57
Do you mind updating your answer to say that your recommendation of `a[:]=v` is actually faster overall than the `fill`? – max Oct 24 '12 at 21:21
@max Is it faster? Broadcasting is a more general way to fill an array and I would guess is slower or equal to the very narrow use case of `fill`. – Paul Oct 24 '12 at 22:52

Nico Schlömer · Answer 4 · 2021-08-15T19:10:22.183

I had np.array(n * [value]) in mind, but apparently that is slower than all other suggestions for large enough n. The best in terms of readability and speed is

np.full(n, 3.14)

Here is full comparison with perfplot (a pet project of mine).

The two empty alternatives are still the fastest (with NumPy 1.12.1). full catches up for large arrays.

Code to generate the plot:

import numpy as np
import perfplot


def empty_fill(n):
    a = np.empty(n)
    a.fill(3.14)
    return a


def empty_colon(n):
    a = np.empty(n)
    a[:] = 3.14
    return a


def ones_times(n):
    return 3.14 * np.ones(n)


def repeat(n):
    return np.repeat(3.14, (n))


def tile(n):
    return np.repeat(3.14, [n])


def full(n):
    return np.full((n), 3.14)


def list_to_array(n):
    return np.array(n * [3.14])


perfplot.show(
    setup=lambda n: n,
    kernels=[empty_fill, empty_colon, ones_times, repeat, tile, full, list_to_array],
    n_range=[2 ** k for k in range(27)],
    xlabel="len(a)",
    logx=True,
    logy=True,
)

Rolf Bartstra · Answer 5 · 2012-11-05T14:07:57.737

Apparently, not only the absolute speeds but also the speed order (as reported by user1579844) are machine dependent; here's what I found:

a=np.empty(1e4); a.fill(5) is fastest;

In descending speed order:

timeit a=np.empty(1e4); a.fill(5) 
# 100000 loops, best of 3: 10.2 us per loop
timeit a=np.empty(1e4); a[:]=5
# 100000 loops, best of 3: 16.9 us per loop
timeit a=np.ones(1e4)*5
# 100000 loops, best of 3: 32.2 us per loop
timeit a=np.tile(5,[1e4])
# 10000 loops, best of 3: 90.9 us per loop
timeit a=np.repeat(5,(1e4))
# 10000 loops, best of 3: 98.3 us per loop
timeit a=np.array([5]*int(1e4))
# 1000 loops, best of 3: 1.69 ms per loop (slowest BY FAR!)

So, try and find out, and use what's fastest on your platform.

score 14 · Answer 6 · answered Jul 09 '18 at 13:02

14

without numpy

>>>[2]*3
[2, 2, 2]

answered Jul 09 '18 at 13:02

tnusraddinov

660
7
13

Suggesting `[v] * n` would be more directly relevant to the OP question. – lit Jul 09 '18 at 14:10
[This answer](https://stackoverflow.com/a/44921249/6634373) already mentioned this approach. – CommonSense Jul 09 '18 at 14:55
what is i want three rows but only one column? – wawawa Feb 02 '23 at 13:43

score 8 · Answer 7 · answered Oct 23 '12 at 15:51

You can use numpy.tile, e.g. :

v = 7
rows = 3
cols = 5
a = numpy.tile(v, (rows,cols))
a
Out[1]: 
array([[7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7]])

Although tile is meant to 'tile' an array (instead of a scalar, as in this case), it will do the job, creating pre-filled arrays of any size and dimension.

score 1 · Answer 8 · answered May 24 '22 at 08:21

1

We could also write

v=7
n=5
a=np.linspace(v,v,n)

answered May 24 '22 at 08:21

Gabriel Soranzo

113
5

score 0 · Answer 9 · answered Jan 25 '22 at 19:07

You can also use np.broadcast_to.

To create an array of shape (dimensions) s and of value v, you can do (in your case, the array is 1-D, and s = (n,)):

a = np.broadcast_to(v, s).copy()

if a only needs to be read-only, you can use the following (which is way more efficient):

a = np.broadcast_to(v, s)

The advantage is that v can be given as a single number, but also as an array if different values are desired (as long as v.shape matches the tail of s).

Bonus: if you want to force the dtype of the created array:

a = np.broadcast_to(np.asarray(v, dtype), s).copy()

NumPy array initialization (fill with identical values)

9 Answers9

Linked

Related