How do I create an empty array and then append to it in NumPy?

Question

I want to create an empty array and append items to it, one at a time.

xs = []
for item in data:
    xs.append(item)

Can I use this list-style notation with NumPy arrays?

score 592 · Accepted Answer · edited Jun 20 '22 at 02:13

592

That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.

Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:

>>> import numpy as np

>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]

>>> a
array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])

edited Jun 20 '22 at 02:13

Mateen Ulhaq

24,552
19
101
135

answered Feb 20 '09 at 10:36

Stephen Simmons

7,941
2
21
13

159

There is also numpy.empty() if you don't need to zero the array. – janneb Apr 19 '09 at 21:19
33

What's the benefit of using empty() over zeros()? – Zach Sep 01 '12 at 16:11
61

that if you're going to initialize it with your data straight away, you save the cost of zeroing it. – marcorossi Nov 13 '12 at 09:23
30

@maracorossi so `.empty()` means one can find random values in the cells, but the array is created quicker than e.g. with `.zeros()` ? – user3085931 Jul 13 '16 at 17:38
13

@user3085931 yep ! – Nathan Sep 30 '16 at 15:33
The difference between `numpy.empty` and `numpy.zero` is quite similar to the difference between an empty set `{}` and a set with one element of zero `{0}`. – Causality Mar 21 '17 at 17:19
1

@Causality Empty set {} is well defined, but numpy.empty will return uninitialized data. – Philipp Claßen Dec 19 '17 at 09:40
Is it possible to create empty NumPy array without initializing the shape. I want to create a flexible 3D NumPy array? – Mustafa Uçar Mar 01 '18 at 07:32
2

While presenting a valuable alternative, this does not answer the question – Thiago Gouvea Apr 19 '18 at 22:51
I think having zeros by default is a dangerous idea, potentially leading to unintended numerical results. `None` avoids this, e.g. `a = np.array([[None]*prealloc_no])`. – A.L. Verminburger Apr 01 '20 at 18:35
Maybe worth mentioning that you can specify the dtype arg to control the element types – Timothy Dalton Dec 29 '20 at 08:19
1

@Nathan `np.empty()` returns random numbers for each item, the items are not "None" as you would normally expect from the word "empty". I do not see why this should be faster than using `np.zeros()`. – questionto42 Dec 29 '21 at 20:52
1

@questionto42standswithUkraine The reason `empty()` is faster is because it merely provides a view on whatever random, meaningless garbage is already in memory in the allocated chunk. `empty()` just allocates the memory and returns, not caring what's already there. `zeros()` allocates the memory, then takes the extra step of initializing every value to zero. It's the difference between whether you do or do not take a bulldozer out to clear the land after getting your title to it registered at the county office. Zeroing it out is an extra step that takes time. – patrick-mooney Jun 06 '22 at 18:41
@patrick-mooney, but that zeroing it out is done in fast compiled code. Compared to time spent setting other values, iteratively or not, that zeroing time is minor. – hpaulj Oct 06 '22 at 05:44
@A.L.Verminburger, Your `a` filled with `None` will have a `object` dtype. `np.empty(prealloc_no, dtype=object)` does the same thing without first making the list of `None`. In this case `np.empty` does initialize the elements, because it isn't safe to have pointers to "random" places in memory. But we shouldn't be making object dtype arrays if we intend to do any sort of numeric work on them. – hpaulj Oct 06 '22 at 05:54
@MustafaUçar, a numpy array always has a `shape` (and `dtype`). There's no such thing as a "flexible array" (with unspecified shape). A python list also has a defined length, but with some extra space for appending more references. Thus list append is relatively efficient. Array growth always requires a copy, and should not be done iteratively. – hpaulj Oct 06 '22 at 06:08
Beside not using np.empty(), this anwer assumes you known the shape (i.e the number of rows) in advance, which is not always the case and not stated in the question. Indeed the question is about the general case so this answer is not appropiate in my opinion. – Luis Vazquez Jan 31 '23 at 16:11

score 140 · Answer 2 · edited Feb 08 '10 at 17:39

A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.

e.g.


mylist = []
for item in data:
    mylist.append(item)
mat = numpy.array(mylist)

item can be a list, an array or any iterable, as long as each item has the same number of elements.
In this particular case (data is some iterable holding the matrix columns) you can simply use


mat = numpy.array(data)

(Also note that using list as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)

EDIT:

If for some reason you really do want to create an empty array, you can just use numpy.array([]), but this is rarely useful!

Are numpy arrays/matrices fundamentally different from Matlab ones? — levesque, Nov 11 '10 at 03:20
If for some reason you need to define an empty array, but with fixed width (e.g. `np.concatenate()`), you can use: `np.empty((0, some_width))`. 0, so your first array won't be garbage. — NumesSanguis, Sep 01 '17 at 05:56
I think this is the right answer to the general case. It doesn't seems very elegant but it's the only way that I have found to address this in numpy. — Luis Vazquez, Jan 31 '23 at 16:13

Franck Dernoncourt · Answer 3 · 2017-04-14T02:46:12.170

87

To create an empty multidimensional array in NumPy (e.g. a 2D array m*n to store your matrix), in case you don't know m how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-buildinging the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n]).

This way you can use for example (here m = 5 which we assume we didn't know when creating the empty matrix, and n = 2):

import numpy as np

n = 2
X = np.empty(shape=[0, n])

for i in range(5):
    for j  in range(2):
        X = np.append(X, [[i, j]], axis=0)

print X

which will give you:

[[ 0.  0.]
 [ 0.  1.]
 [ 1.  0.]
 [ 1.  1.]
 [ 2.  0.]
 [ 2.  1.]
 [ 3.  0.]
 [ 3.  1.]
 [ 4.  0.]
 [ 4.  1.]]

edited Apr 14 '17 at 02:46

answered Apr 10 '14 at 04:34

Franck Dernoncourt

77,520
72
342
501

6

This should be the answer to the question OP asked, for the use case where you don't know #rows in advance, or want to handle the case that there are 0 rows – Hansang Aug 15 '19 at 04:52
While this does work as the OP asked, it is not a good answer. If you know the iteration range you know the target array size. – hpaulj Apr 20 '21 at 02:49
3

But there are of course plenty of examples where you don't know the iteration range and you don't care about the computational cost. Good answer in that case! – Tom Saenen Dec 05 '21 at 10:54

score 32 · Answer 4 · edited Feb 07 '16 at 03:13

32

I looked into this a lot because I needed to use a numpy.array as a set in one of my school projects and I needed to be initialized empty... I didn't found any relevant answer here on Stack Overflow, so I started doodling something.

# Initialize your variable as an empty list first
In [32]: x=[]
# and now cast it as a numpy ndarray
In [33]: x=np.array(x)

The result will be:

In [34]: x
Out[34]: array([], dtype=float64)

Therefore you can directly initialize an np array as follows:

In [36]: x= np.array([], dtype=np.float64)

I hope this helps.

edited Feb 07 '16 at 03:13

gsamaras

71,951
46
188
305

answered Apr 10 '13 at 12:39

Andrei Paga

321
3
3

1

This does not work for arrays, as in the question, but it can be useful for vectors. – divenex Dec 22 '17 at 16:31
2

`a=np.array([])` seems to default to `float64` – P i Sep 07 '19 at 09:58

Pedram · Answer 5 · 2021-08-21T09:46:23.890

17

For creating an empty NumPy array without defining its shape you can do the following:

arr = np.array([])

The first one is preferred because you know you will be using this as a NumPy array. NumPy converts this to np.ndarray type afterward, without extra [] 'dimension'.

for adding new element to the array us can do:

arr = np.append(arr, 'new element')

Note that in the background for python there's no such thing as an array without defining its shape. as @hpaulj mentioned this also makes a one-rank array.

edited Aug 21 '21 at 09:46

answered Apr 14 '20 at 06:23

Pedram

557
5
17

1

No., `np.array([])` creates an array with shape (0,), a 1d array with 0 elements. There's no such thing as an array without defined shape. And 2) does the same thing as 1). – hpaulj Apr 20 '21 at 02:42
It's true @hpaulj although the whole point of the discussion is to not think mentally about the shape when you're creating one. worth mentioning that anyway. – Pedram Aug 21 '21 at 09:27

score 10 · Answer 6 · edited Aug 30 '13 at 10:20

10

You can use the append function. For rows:

>>> from numpy import *
>>> a = array([10,20,30])
>>> append(a, [[1,2,3]], axis=0)
array([[10, 20, 30],      
       [1, 2, 3]])

For columns:

>>> append(a, [[15],[15]], axis=1)
array([[10, 20, 30, 15],      
       [1, 2, 3, 15]])

EDIT
Of course, as mentioned in other answers, unless you're doing some processing (ex. inversion) on the matrix/array EVERY time you append something to it, I would just create a list, append to it then convert it to an array.

edited Aug 30 '13 at 10:20

pradyunsg

18,287
11
43
96

answered Feb 20 '09 at 10:27

Il-Bhima

10,744
1
47
51

3

How does this answer the question? I don't see the part about empty arrays – KansaiRobot Sep 10 '21 at 05:07

score 5 · Answer 7 · answered Feb 05 '20 at 08:41

5

Here is some workaround to make numpys look more like Lists

np_arr = np.array([])
np_arr = np.append(np_arr , 2)
np_arr = np.append(np_arr , 24)
print(np_arr)

OUTPUT: array([ 2., 24.])

answered Feb 05 '20 at 08:41

Darius

596
1
6
22

1

Stay away from `np.append`. It's not a list append clone, despite the poorly chosen name. – hpaulj Apr 20 '21 at 02:45

Ali G · Answer 8 · 2015-10-01T18:07:31.860

3

You can apply it to build any kind of array, like zeros:

a = range(5)
a = [i*0 for i in a]
print a 
[0, 0, 0, 0, 0]

edited Oct 01 '15 at 18:07

answered Oct 01 '15 at 17:50

Ali G

29
2

4

If you want to do that in pure python, `a= [0] * 5` is the simple solution – Makers_F Dec 22 '15 at 03:46

score 3 · Answer 9 · answered Sep 06 '11 at 21:20

If you absolutely don't know the final size of the array, you can increment the size of the array like this:

my_arr = numpy.zeros((0,5))
for i in range(3):
    my_arr=numpy.concatenate( ( my_arr, numpy.ones((1,5)) ) )
print(my_arr)

[[ 1.  1.  1.  1.  1.]  [ 1.  1.  1.  1.  1.]  [ 1.  1.  1.  1.  1.]]

Notice the 0 in the first line.
numpy.append is another option. It calls numpy.concatenate.

score 2 · Answer 10 · answered Sep 11 '16 at 00:28

Depending on what you are using this for, you may need to specify the data type (see 'dtype').

For example, to create a 2D array of 8-bit values (suitable for use as a monochrome image):

myarray = numpy.empty(shape=(H,W),dtype='u1')

For an RGB image, include the number of color channels in the shape: shape=(H,W,3)

You may also want to consider zero-initializing with numpy.zeros instead of using numpy.empty. See the note here.

SteveTz · Answer 11 · 2021-03-11T07:34:49.110

2

Another simple way to create an empty array that can take array is:

import numpy as np
np.empty((2,3), dtype=object)

edited Mar 11 '21 at 07:34

answered Mar 11 '21 at 07:11

SteveTz

172
2
8

score 1 · Answer 12 · answered Oct 09 '18 at 06:43

1

I think you want to handle most of the work with lists then use the result as a matrix. Maybe this is a way ;

ur_list = []
for col in columns:
    ur_list.append(list(col))

mat = np.matrix(ur_list)

answered Oct 09 '18 at 06:43

runo

41
1
8

score 1 · Answer 13 · answered Aug 30 '19 at 13:47

1

I think you can create empty numpy array like:

>>> import numpy as np
>>> empty_array= np.zeros(0)
>>> empty_array
array([], dtype=float64)
>>> empty_array.shape
(0,)

This format is useful when you want to append numpy array in the loop.

answered Aug 30 '19 at 13:47

veeresh d

71
1
6

score 0 · Answer 14 · answered Jun 19 '20 at 23:38

0

Perhaps what you are looking for is something like this:

x=np.array(0)

In this way you can create an array without any element. It similar than:

x=[]

This way you will be able to append new elements to your array in advance.

answered Jun 19 '20 at 23:38

Edgar Duarte

1

1

No, your `x` is a an array with shape (), and one element. It is more like `0` than `[]`. You could call it a 'scalar array'. – hpaulj Apr 20 '21 at 02:44

score 0 · Answer 15 · answered May 20 '21 at 16:33

0

The simplest way

Input:

import numpy as np
data = np.zeros((0, 0), dtype=float)   # (rows,cols)
data.shape

Output:
(0, 0)

Input:

for i in range(n_files):
     data = np.append(data, new_data, axis = 0)

answered May 20 '21 at 16:33

user3810512

587
4
3

1

Please don't recommend using `np.append` in a loop. – hpaulj Oct 06 '22 at 05:37

score 0 · Answer 16 · answered Jul 19 '23 at 11:12

You might be better off using vstack in general case where you might want to add array of array. For example, let's say you generate batches and accumulate them.

import numpy as np
embeddings = np.empty((0, 768), dtype=np.float32)
for i in range(10):
    batch = generate() # shape: (64, 768)
    embeddings = np.vstack((embeddings, batch))

How do I create an empty array and then append to it in NumPy?

16 Answers16

The simplest way

Linked

Related