441

I want to create an empty array and append items to it, one at a time.

xs = []
for item in data:
    xs.append(item)

Can I use this list-style notation with NumPy arrays?

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Ben
  • 66,838
  • 37
  • 84
  • 108

16 Answers16

592

That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.

Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:

>>> import numpy as np

>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]

>>> a
array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Stephen Simmons
  • 7,941
  • 2
  • 21
  • 13
  • 159
    There is also numpy.empty() if you don't need to zero the array. – janneb Apr 19 '09 at 21:19
  • 33
    What's the benefit of using empty() over zeros()? – Zach Sep 01 '12 at 16:11
  • 61
    that if you're going to initialize it with your data straight away, you save the cost of zeroing it. – marcorossi Nov 13 '12 at 09:23
  • 30
    @maracorossi so `.empty()` means one can find random values in the cells, but the array is created quicker than e.g. with `.zeros()` ? – user3085931 Jul 13 '16 at 17:38
  • 13
    @user3085931 yep ! – Nathan Sep 30 '16 at 15:33
  • The difference between `numpy.empty` and `numpy.zero` is quite similar to the difference between an empty set `{}` and a set with one element of zero `{0}`. – Causality Mar 21 '17 at 17:19
  • 1
    @Causality Empty set {} is well defined, but numpy.empty will return uninitialized data. – Philipp Claßen Dec 19 '17 at 09:40
  • Is it possible to create empty NumPy array without initializing the shape. I want to create a flexible 3D NumPy array? – Mustafa Uçar Mar 01 '18 at 07:32
  • 2
    While presenting a valuable alternative, this does not answer the question – Thiago Gouvea Apr 19 '18 at 22:51
  • I think having zeros by default is a dangerous idea, potentially leading to unintended numerical results. `None` avoids this, e.g. `a = np.array([[None]*prealloc_no])`. – A.L. Verminburger Apr 01 '20 at 18:35
  • Maybe worth mentioning that you can specify the dtype arg to control the element types – Timothy Dalton Dec 29 '20 at 08:19
  • 1
    @Nathan `np.empty()` returns random numbers for each item, the items are not "None" as you would normally expect from the word "empty". I do not see why this should be faster than using `np.zeros()`. – questionto42 Dec 29 '21 at 20:52
  • 1
    @questionto42standswithUkraine The reason `empty()` is faster is because it merely provides a view on whatever random, meaningless garbage is already in memory in the allocated chunk. `empty()` just allocates the memory and returns, not caring what's already there. `zeros()` allocates the memory, then takes the extra step of initializing every value to zero. It's the difference between whether you do or do not take a bulldozer out to clear the land after getting your title to it registered at the county office. Zeroing it out is an extra step that takes time. – patrick-mooney Jun 06 '22 at 18:41
  • @patrick-mooney, but that zeroing it out is done in fast compiled code. Compared to time spent setting other values, iteratively or not, that zeroing time is minor. – hpaulj Oct 06 '22 at 05:44
  • @A.L.Verminburger, Your `a` filled with `None` will have a `object` dtype. `np.empty(prealloc_no, dtype=object)` does the same thing without first making the list of `None`. In this case `np.empty` does initialize the elements, because it isn't safe to have pointers to "random" places in memory. But we shouldn't be making object dtype arrays if we intend to do any sort of numeric work on them. – hpaulj Oct 06 '22 at 05:54
  • @MustafaUçar, a numpy array always has a `shape` (and `dtype`). There's no such thing as a "flexible array" (with unspecified shape). A python list also has a defined length, but with some extra space for appending more references. Thus list append is relatively efficient. Array growth always requires a copy, and should not be done iteratively. – hpaulj Oct 06 '22 at 06:08
  • Beside not using np.empty(), this anwer assumes you known the shape (i.e the number of rows) in advance, which is not always the case and not stated in the question. Indeed the question is about the general case so this answer is not appropiate in my opinion. – Luis Vazquez Jan 31 '23 at 16:11
140

A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.

e.g.


mylist = []
for item in data:
    mylist.append(item)
mat = numpy.array(mylist)

item can be a list, an array or any iterable, as long as each item has the same number of elements.
In this particular case (data is some iterable holding the matrix columns) you can simply use


mat = numpy.array(data)

(Also note that using list as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)

EDIT:

If for some reason you really do want to create an empty array, you can just use numpy.array([]), but this is rarely useful!

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Greg Ball
  • 3,671
  • 3
  • 22
  • 15
  • 1
    Are numpy arrays/matrices fundamentally different from Matlab ones? – levesque Nov 11 '10 at 03:20
  • 4
    If for some reason you need to define an empty array, but with fixed width (e.g. `np.concatenate()`), you can use: `np.empty((0, some_width))`. 0, so your first array won't be garbage. – NumesSanguis Sep 01 '17 at 05:56
  • I think this is the right answer to the general case. It doesn't seems very elegant but it's the only way that I have found to address this in numpy. – Luis Vazquez Jan 31 '23 at 16:13
87

To create an empty multidimensional array in NumPy (e.g. a 2D array m*n to store your matrix), in case you don't know m how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-buildinging the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n]).

This way you can use for example (here m = 5 which we assume we didn't know when creating the empty matrix, and n = 2):

import numpy as np

n = 2
X = np.empty(shape=[0, n])

for i in range(5):
    for j  in range(2):
        X = np.append(X, [[i, j]], axis=0)

print X

which will give you:

[[ 0.  0.]
 [ 0.  1.]
 [ 1.  0.]
 [ 1.  1.]
 [ 2.  0.]
 [ 2.  1.]
 [ 3.  0.]
 [ 3.  1.]
 [ 4.  0.]
 [ 4.  1.]]
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
  • 6
    This should be the answer to the question OP asked, for the use case where you don't know #rows in advance, or want to handle the case that there are 0 rows – Hansang Aug 15 '19 at 04:52
  • While this does work as the OP asked, it is not a good answer. If you know the iteration range you know the target array size. – hpaulj Apr 20 '21 at 02:49
  • 3
    But there are of course plenty of examples where you don't know the iteration range and you don't care about the computational cost. Good answer in that case! – Tom Saenen Dec 05 '21 at 10:54
32

I looked into this a lot because I needed to use a numpy.array as a set in one of my school projects and I needed to be initialized empty... I didn't found any relevant answer here on Stack Overflow, so I started doodling something.

# Initialize your variable as an empty list first
In [32]: x=[]
# and now cast it as a numpy ndarray
In [33]: x=np.array(x)

The result will be:

In [34]: x
Out[34]: array([], dtype=float64)

Therefore you can directly initialize an np array as follows:

In [36]: x= np.array([], dtype=np.float64)

I hope this helps.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
Andrei Paga
  • 321
  • 3
  • 3
17

For creating an empty NumPy array without defining its shape you can do the following:

arr = np.array([])

The first one is preferred because you know you will be using this as a NumPy array. NumPy converts this to np.ndarray type afterward, without extra [] 'dimension'.

for adding new element to the array us can do:

arr = np.append(arr, 'new element')

Note that in the background for python there's no such thing as an array without defining its shape. as @hpaulj mentioned this also makes a one-rank array.

Pedram
  • 557
  • 5
  • 17
  • 1
    No., `np.array([])` creates an array with shape (0,), a 1d array with 0 elements. There's no such thing as an array without defined shape. And 2) does the same thing as 1). – hpaulj Apr 20 '21 at 02:42
  • It's true @hpaulj although the whole point of the discussion is to not think mentally about the shape when you're creating one. worth mentioning that anyway. – Pedram Aug 21 '21 at 09:27
10

You can use the append function. For rows:

>>> from numpy import *
>>> a = array([10,20,30])
>>> append(a, [[1,2,3]], axis=0)
array([[10, 20, 30],      
       [1, 2, 3]])

For columns:

>>> append(a, [[15],[15]], axis=1)
array([[10, 20, 30, 15],      
       [1, 2, 3, 15]])

EDIT
Of course, as mentioned in other answers, unless you're doing some processing (ex. inversion) on the matrix/array EVERY time you append something to it, I would just create a list, append to it then convert it to an array.

pradyunsg
  • 18,287
  • 11
  • 43
  • 96
Il-Bhima
  • 10,744
  • 1
  • 47
  • 51
5

Here is some workaround to make numpys look more like Lists

np_arr = np.array([])
np_arr = np.append(np_arr , 2)
np_arr = np.append(np_arr , 24)
print(np_arr)

OUTPUT: array([ 2., 24.])

Darius
  • 596
  • 1
  • 6
  • 22
  • 1
    Stay away from `np.append`. It's not a list append clone, despite the poorly chosen name. – hpaulj Apr 20 '21 at 02:45
3

You can apply it to build any kind of array, like zeros:

a = range(5)
a = [i*0 for i in a]
print a 
[0, 0, 0, 0, 0]
Ali G
  • 29
  • 2
3

If you absolutely don't know the final size of the array, you can increment the size of the array like this:

my_arr = numpy.zeros((0,5))
for i in range(3):
    my_arr=numpy.concatenate( ( my_arr, numpy.ones((1,5)) ) )
print(my_arr)

[[ 1.  1.  1.  1.  1.]  [ 1.  1.  1.  1.  1.]  [ 1.  1.  1.  1.  1.]]
  • Notice the 0 in the first line.
  • numpy.append is another option. It calls numpy.concatenate.
cyborg
  • 9,989
  • 4
  • 38
  • 56
2

Depending on what you are using this for, you may need to specify the data type (see 'dtype').

For example, to create a 2D array of 8-bit values (suitable for use as a monochrome image):

myarray = numpy.empty(shape=(H,W),dtype='u1')

For an RGB image, include the number of color channels in the shape: shape=(H,W,3)

You may also want to consider zero-initializing with numpy.zeros instead of using numpy.empty. See the note here.

Brent Bradburn
  • 51,587
  • 17
  • 154
  • 173
2

Another simple way to create an empty array that can take array is:

import numpy as np
np.empty((2,3), dtype=object)
SteveTz
  • 172
  • 2
  • 8
1

I think you want to handle most of the work with lists then use the result as a matrix. Maybe this is a way ;

ur_list = []
for col in columns:
    ur_list.append(list(col))

mat = np.matrix(ur_list)
runo
  • 41
  • 1
  • 8
1

I think you can create empty numpy array like:

>>> import numpy as np
>>> empty_array= np.zeros(0)
>>> empty_array
array([], dtype=float64)
>>> empty_array.shape
(0,)

This format is useful when you want to append numpy array in the loop.

veeresh d
  • 71
  • 1
  • 6
0

Perhaps what you are looking for is something like this:

x=np.array(0)

In this way you can create an array without any element. It similar than:

x=[]

This way you will be able to append new elements to your array in advance.

  • 1
    No, your `x` is a an array with shape (), and one element. It is more like `0` than `[]`. You could call it a 'scalar array'. – hpaulj Apr 20 '21 at 02:44
0

The simplest way

Input:

import numpy as np
data = np.zeros((0, 0), dtype=float)   # (rows,cols)
data.shape

Output:
(0, 0)

Input:

for i in range(n_files):
     data = np.append(data, new_data, axis = 0)
user3810512
  • 587
  • 4
  • 3
0

You might be better off using vstack in general case where you might want to add array of array. For example, let's say you generate batches and accumulate them.

import numpy as np
embeddings = np.empty((0, 768), dtype=np.float32)
for i in range(10):
    batch = generate() # shape: (64, 768)
    embeddings = np.vstack((embeddings, batch))
Shital Shah
  • 63,284
  • 17
  • 238
  • 185