Insert or append empty rows to a numpy array

Question

There are references to using np.append to add to an initially empty array, such as How to add a new row to an empty numpy array.

Instead, my question is how to allocate extra empty space at the end of an array so that it can later be assigned to.

An example:

# Inefficient: The data in new_rows gets copied twice.
array = np.arange(6).reshape(2, 3)
new_rows = np.square(array)
new = np.concatenate((array, new_rows), axis=0)

# Instead, we would like something like the following:

def append_new_empty_rows(array, num_rows):
  new_rows = np.empty_like(array, shape=(num_rows, array.shape[1]))
  return np.concatenate((array, new_rows), axis=0)

array = np.arange(6).reshape(2, 3)
new = append_new_empty_rows(array, 2)
np.square(array[:2], out=new[2:])

However, the np.concatenate() likely still copies the empty data array? Is there something like an np.append_empty()?

Careful, `np.empty` makes a whole new array. It's like `np.zeros` except the element values are unpredictable. You aren't saving any memory or copies by using it. — hpaulj, Apr 20 '21 at 02:22
That llink has a lot of bad answers. The only good one(s) stick with list append, and make an array at the end. — hpaulj, Apr 20 '21 at 02:53

score 0 · Answer 1 · edited Apr 20 '21 at 02:35

Why don't you do it as follows:

array = np.arange(6).reshape(2, 3)
n_rows = 4
new = np.vstack([array, np.zeros((n_rows, array.shape[1]) )])

The new array will be this:

array([[0., 1., 2.],
   [3., 4., 5.],
   [0., 0., 0.],
   [0., 0., 0.],
   [0., 0., 0.],
   [0., 0., 0.]])

If what you want is to save some space, then you should consider using the out parameter provided by concatenate. So it would be like this:

array = np.arange(6).reshape(2, 3)
n_rows = 4
np.concatenate([array, np.zeros((n_rows, array.shape[1]))], out=array)

As you can see, the only assignment is array and there is not any copy created. It overwrites array instead...

Your last block raises a ValueError. The `out` isn't big enough. — hpaulj, Apr 20 '21 at 02:34

score 0 · Answer 2 · answered Apr 20 '21 at 02:31

Here's what you are doing:

Make an array that's big enough for both pieces. np.zeros avoids any illusions that we are saving memory or work.

In [15]: arr1 = np.zeros((4,3), int)
In [16]: arr1
Out[16]: 
array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

Now copy values from the initial (2,3) to part of arr1:

In [17]: arr1[:2] = arr
In [18]: arr1
Out[18]: 
array([[0, 1, 2],
       [3, 4, 5],
       [0, 0, 0],
       [0, 0, 0]])

and use the out to copy square values to the 2nd part

In [19]: np.square(arr[:2], out=arr1[2:])
Out[19]: 
array([[ 0,  1,  4],
       [ 9, 16, 25]])

In [21]: arr1
Out[21]: 
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 0,  1,  4],
       [ 9, 16, 25]])

I don't see how that saves any effort or memory compared to:

In [22]: np.concatenate((arr, np.square(arr)), axis=0)
Out[22]: 
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 0,  1,  4],
       [ 9, 16, 25]])

concatenate, under the covers must be making a result array of the right size, and copying the pieces to it. There's really no getting around that if you want an array that contains both arr and np.square(arr).

I like this direction. However, wouldn't it be best to allocate `arr1` using `np.empty()`, to save the work of initializing the array, because both parts of the array will be written to exactly once? Otherwise, the array is first written with zeros, then overwritten again with the final data. (That's the motivation of `np.empty()`.) — Hugues, Apr 20 '21 at 05:44
I've found that the time difference is small enough that its not worth the confusion it a can cause. — hpaulj, Apr 20 '21 at 06:48

Hugues · Accepted Answer · 2021-04-22T06:26:22.157

I find that the fastest solution is to create an empty larger array and then copy the input array into its initial rows:

shape = (1000, 1000)
array = np.ones(shape)
new_shape = (2000, 1000)

def version1():  # Uses np.concatenate().
  new_rows = np.square(array)
  return np.concatenate((array, new_rows), axis=0)

def version2():  # Initializes new array using np.zeros().
  new = np.zeros(new_shape)
  new[:shape[0]] = array
  np.square(array, out=new[shape[0]:])
  return new

def append_new_empty_rows(array, num_rows):
  new = np.empty((array.shape[0] + num_rows, array.shape[1]))
  new[:array.shape[0]] = array
  return new

def version3():  # Initializes new array using np.empty().
  new = append_new_empty_rows(array, num_rows=array.shape[0])
  np.square(array, out=new[array.shape[0]:])
  return new

assert np.all(version1() == version2())
assert np.all(version1() == version3())

%timeit version1()  # 4.34 ms per loop
%timeit version2()  # 3.15 ms per loop
%timeit version3()  # 2.24 ms per loop

Insert or append empty rows to a numpy array

3 Answers3