6

I will keep it simple.I have a loop that appends new row to a numpy array...what is the efficient way to do this.

n=np.zeros([1,2])
for x in [[2,3],[4,5],[7,6]]
      n=np.append(n,x,axis=1)

Now the thing is there is a [0,0] sticking to it so I have to remove it by

   del n[0]

Which seems dumb...So please tell me an efficient way to do this.

   n=np.empty([1,2])

is even worse it creates an uninitialised value.

user3443615
  • 155
  • 1
  • 1
  • 9
  • Why don't you just do `n = np.array([[2,3],[4,5],[7,6]])`? – BrenBarn Jun 26 '14 at 20:04
  • 1
    It's just an example,in my program every iteration appends a different value – user3443615 Jun 26 '14 at 20:07
  • 1
    Appending to numpy arrays is inherently inefficient, so this kind of approach is never going to be great performance-wise. – BrenBarn Jun 26 '14 at 20:11
  • I believe `del n[0]` will raise an error if `n` is a numpy array. – Bi Rico Jun 26 '14 at 20:15
  • It will be more efficient to append to a list, and build the array from that list of lists. Appending to array is not efficient. – hpaulj Jun 26 '14 at 21:03
  • take a look at this question as well: http://stackoverflow.com/questions/24401310/how-do-you-create-a-multidimensional-numpy-array-from-an-iterable-of-tuples – newtover Jun 27 '14 at 22:28
  • Does this answer your question? [Fastest way to grow a numpy numeric array](https://stackoverflow.com/questions/7133885/fastest-way-to-grow-a-numpy-numeric-array) – user202729 Oct 25 '21 at 00:57

3 Answers3

10

A bit of technical explanation for the "why lists" part.

Internally, the problem for a list of unknown length is that it needs to fit in memory somehow regardless of its length. There are essentially two different possibilities:

  1. Use a data structure (linked list, some tree structure, etc.) which makes it possible to allocate memory separately for each new element in a list.

  2. Store the data in a contiguous memory area. This area has to be allocated when the list is created, and it has to be larger than what we initially need. If we get more stuff into the list, we need to try to allocate more memory, preferably at the same location. If we cannot do it at the same location, we need to allocate a bigger block and move all data.

The first approach enables all sorts of fancy insertion and deletion options, sorting, etc. However, it is slower in sequential reading and allocates more memory. Python actually uses the method #2, the lists are stored as "dynamic arrays". For more information on this, please see:

Size of list in memory

What this means is that lists are designed to be very efficient with the use of append. There is very little you can do to speed things up if you do not know the size of the list beforehand.


If you know even the maximum size of the list beforehand, you are probably best off allocating a numpy.array using numpy.empty (not numpy.zeros) with the maximum size and then use ndarray.resize to shrink the array once you have filled in all data.

For some reason numpy.array(l) where l is a list is often slow with large lists, whereas copying even large arrays is quite fast (I just tried to create a copy of a 100 000 000 element array; it took less than 0.5 seconds).

This discussion has more benchmarking on different options:

Fastest way to grow a numpy numeric array

I have not benchmarked the numpy.empty + ndarray.resize combo, but both should be rather microsecond than millisecond operations.

Community
  • 1
  • 1
DrV
  • 22,637
  • 7
  • 60
  • 72
  • A python list is kind of using both (1) and (2), since every object has its own allocation and the list element is just a reference to it.. Compare with a numpy record array, where each record is stored inline in the array, compactly using the (2) strategy. What sets them apart is their allocation strategy; the list will allocate extra capacity. The (1) factor ensures that each slot in the list is by itself quite small, so the unused capacity doesn't cost that much. – bluss Jan 24 '17 at 23:10
  • I found that at least one of the best solutions in my case was to allocate a huge enough space upfront and later change it and move the custom "length" variable. Required a bit more work, but performance was worth it. – Íhor Mé Aug 22 '17 at 07:05
6

There are three ways to do this, if you already have everything in a list:

data = [[2, 3], [4, 5], [7, 6]]
n = np.array(data)

If you know how big the final array will be:

exp = np.array([2, 3])    

n = np.empty((3, 2))
for i in range(3):
    n[i, :] = i ** exp

If you don't know how big the final array will be:

exp = np.array([2, 3])

n = []
i = np.random.random()
while i < .9:
    n.append(i ** exp)
    i = np.random.random()
n = np.array(n)

Just or the record you can start with n = np.empty((0, 2)) but I would not suggest appending to that array in a loop.

Bi Rico
  • 25,283
  • 3
  • 52
  • 75
  • 2
    All methods use list...Isn't there a way to do it totally in np.arrays...couldn't the numpy guys figure out a way?? and 'n=np.empty((0,2))' doesn't work – user3443615 Jun 26 '14 at 20:17
  • 1
    I 100% guarantee that `n = np.empty((0, 2))` works. You don't have to use lists, but you have to use something. Your data cannot come from thin air, it has to be in some structure before it gets put into an array (ie, a file, another array, a list, a tuple, a double ...). – Bi Rico Jun 26 '14 at 20:23
  • Also in the third case, when you don't know the final size of the array, I used a list intentionally. The python lists is actually implemented as a [dynamic array](http://en.wikipedia.org/wiki/Dynamic_array) which is what you want in the last case. Numpy arrays are not dynamic arrays and so you're better off using a list. – Bi Rico Jun 26 '14 at 20:30
0

You might want to try:

import numpy as np

n = np.reshape([], (0, 2))
for x in [[2,3],[4,5],[7,6]]:
      n = np.append(n, [x], axis=0)

Instead of np.append you can also use n = np.vstack([n,x]). I also agree with @Bi Rico that I also would use a list, if n does not need to accessed within the loop.

Dietrich
  • 5,241
  • 3
  • 24
  • 36