25

As I understand, the list type in Python is a dynamic pointer array, which will increase it's capacity when items are appended to it. And an array in NumPy uses a continuous memory area to hold all the data of the array.

Are there any types that dynamically increase its capacity as a list, and stores the value as a NumPy array? Something like List in C#. And it's great if the type has the same interface as a NumPy array.

I can create a class which wraps a NumPy array inside, and resize this array when it's full, such as:

class DynamicArray(object):
    def __init__(self):
        self._data = np.zeros(100)
        self._size = 0

    def get_data(self):
        return self._data[:self._size]

    def append(self, value):
        if len(self._data) == self._size:
            self._data = np.resize(self._data, int(len(self._data)*1.25))
        self._data[self._size] = value
        self._size += 1

but DynamicArray can't be used as a NumPy array, and I think all the views returned by get_data() before np.resize() will hold the old array.

Edit: array type in array module is dynamic array. The following program test the increase factor of list and array:

from array import array
import time
import numpy as np
import pylab as pl

def test_time(func):
    arrs = [func() for i in xrange(2000)]
    t = []
    for i in xrange(2000):
        start = time.clock()
        for a in arrs:
            a.append(i)
        t.append(time.clock()-start)
    return np.array(t)

t_list = test_time(lambda:[])
t_array = test_time(lambda:array("d"))
pl.subplot(211)
pl.plot(t_list, label="list")
pl.plot(t_array, label="array")
pl.legend()
pl.subplot(212)
pl.plot(np.where(t_list>2*np.median(t_list))[0])
pl.plot(np.where(t_array>2*np.median(t_array))[0])
pl.show()

enter image description here

from the graph: the increase factor of list is bigger than array.

Rob Bednark
  • 25,981
  • 23
  • 80
  • 125
HYRY
  • 94,853
  • 25
  • 187
  • 187
  • 1
    You do know that numpy has an append function, right? It creates a copy of the data, but then, so does `numpy.resize` which you use above. If that doesn't do what you want, then could you explain a bit more why you want this? – senderle Aug 05 '11 at 01:38
  • @senderle: Yes I know the append function, but I need a dynamic array that increase it's capacity by a factor such as 1.25 when it is full. – HYRY Aug 05 '11 at 01:48

1 Answers1

22

You may be interested to know that the Python standard library also includes an array module which sounds like just what you want:

This module defines an object type which can compactly represent an array of basic values: characters, integers, floating point numbers. Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • 1
    thank you. I didn't know that array has append() method. It will be great if there are some similar type in NumPy, because I want to use ufuncs to do calculation with this dynamic array. – HYRY Aug 05 '11 at 02:08
  • @user772649, for what it's worth, the `append` method of an `array` doesn't increase its capacity by a factor -- it increases its capacity by exactly one. Likewise, the `extend` method increases its capacity by exactly the number of items added. – senderle Aug 05 '11 at 02:13
  • @senderle, I tested the append method of array by measure the time it cost, since resize the array will use more time. I edited the original question and added the increase graph. From the graph, you can see that array increase by factor, which is small than list. – HYRY Aug 05 '11 at 02:34
  • 3
    @user772649, actually, looking at the [source code](http://hg.python.org/cpython/file/5b7e765ce049/Modules/arraymodule.c), I see I was wrong -- `array`s are indeed overallocated. Sorry! I ought to check the source more often :) – senderle Aug 05 '11 at 03:05