array.array versus numpy.array

Question

If you are creating a 1d array in Python, is there any benefit to using the NumPy package?

score 77 · Accepted Answer · edited Dec 06 '22 at 19:00

77

It all depends on what you plan to do with the array. If all you're doing is creating arrays of simple data types and doing I/O, the array module will do just fine.

If, on the other hand, you want to do any kind of numerical calculations, the array module doesn't provide any help with that. NumPy (and SciPy) give you a wide variety of operations between arrays and special functions that are useful not only for scientific work but for things like advanced image manipulation or in general anything where you need to perform efficient calculations with large amounts of data.

Numpy is also much more flexible, e.g. it supports arrays of any type of Python objects, and is also able to interact "natively" with your own objects if they conform to the array interface.

edited Dec 06 '22 at 19:00

Glorfindel

21,988
13
81
109

answered Sep 21 '08 at 20:31

dF.

74,139
30
130
136

23

In defense of array.array, I think its important to note that it is also a lot more lightweight than numpy.array, and that saying 'will do just fine' for a 1D array should really be 'a lot faster, smaller, and works in pypy/cython without issues.' I love NumPy, but for simple arrays the array.array module is actually better. – J.J Nov 01 '15 at 11:59
1

@J.J having never used an array.array in my life, would you mind giving a use case when it is meaningfully better (perhaps I am overusing np.ndarray)? – VF1 Jan 01 '18 at 00:02

nivniv · Answer 2 · 2017-05-10T08:42:13.890

Small bootstrapping for the benefit of whoever might find this useful (following the excellent answer by @dF.):

import numpy as np
from array import array

# Fixed size numpy array
def np_fixed(n):
    q = np.empty(n)
    for i in range(n):
        q[i] = i
    return q

# Resize with np.resize
def np_class_resize(isize, n):
    q = np.empty(isize)
    for i in range(n):
        if i>=q.shape[0]:
            q = np.resize(q, q.shape[0]*2)        
        q[i] = i
    return q    

# Resize with the numpy.array method
def np_method_resize(isize, n):
    q = np.empty(isize)
    for i in range(n):
        if i>=q.shape[0]:
            q.resize(q.shape[0]*2)
        q[i] = i
    return q

# Array.array append
def arr(n):
    q = array('d')
    for i in range(n):
        q.append(i)
    return q

isize = 1000
n = 10000000

The output gives:

%timeit -r 10 a = np_fixed(n)
%timeit -r 10 a = np_class_resize(isize, n)
%timeit -r 10 a = np_method_resize(isize, n)
%timeit -r 10 a = arr(n)

1 loop, best of 10: 868 ms per loop
1 loop, best of 10: 2.03 s per loop
1 loop, best of 10: 2.02 s per loop
1 loop, best of 10: 1.89 s per loop

It seems that array.array is slightly faster and the 'api' saves you some hassle, but if you need more than just storing doubles then numpy.resize is not a bad choice after all (if used correctly).

I think the question is refering to [python.array](https://docs.python.org/3/library/array.html) instead of *list*. — tartaruga_casco_mole, Aug 08 '19 at 14:52

Alok Nayak · Answer 3 · 2022-06-22T09:19:14.053

For storage purposes, both numpy array and array.array are comparable. Here is the code for benchmark for both comparing storage size of unsigned integer of 4 bytes. Other datatypes can also be used for comparison. Data of list and tuple is also added for comparison

import sys
import numpy as np
from array import array

def getsizeof_deep(obj, seen=None):
    """Recursively finds size of objects"""
    size = sys.getsizeof(obj)
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    # Important mark as seen *before* entering recursion to gracefully handle
    # self-referential objects
    seen.add(obj_id)
    if isinstance(obj, dict):
        size += sum([getsizeof_deep(v, seen) for v in obj.values()])
        size += sum([getsizeof_deep(k, seen) for k in obj.keys()])
    elif hasattr(obj, '__dict__'):
        size += getsizeof_deep(obj.__dict__, seen)
    elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
        size += sum([getsizeof_deep(i, seen) for i in obj])

    return size

print("size per element for list, tuple, numpy array, array.array:===============")
for i in range(1, 100, 5):
    aa = list(range(i))
    n = len(aa)
    list_size = getsizeof_deep(aa)
    tup_aa = tuple(aa)
    tup_size = getsizeof_deep(tup_aa)
    nparr = np.array(aa, dtype='uint32')
    np_size = getsizeof_deep(nparr)
    arr = array('I', aa)#4 byte unsigned integer(in ubuntu)
    arr_size = getsizeof_deep(arr)
    print('number of element:%s, list %.2f, tuple %.2f, np.array %.2f, arr.array %.2f' % \
          (len(aa), list_size/n, tup_size/n, np_size/n, arr_size/n))

This was producing the following ouput in my machine:

size per element for list, tuple, numpy array, array.array:===============

number of element:1, list 88.00, tuple 72.00, np.array 136.00, arr.array 92.00
number of element:6, list 44.67, tuple 42.00, np.array 49.33, arr.array 42.00
number of element:11, list 40.73, tuple 39.27, np.array 41.45, arr.array 37.45
number of element:16, list 39.25, tuple 38.25, np.array 38.50, arr.array 35.75
number of element:21, list 38.48, tuple 37.71, np.array 36.95, arr.array 34.86
number of element:26, list 38.00, tuple 37.38, np.array 36.00, arr.array 34.31
number of element:31, list 37.68, tuple 37.16, np.array 35.35, arr.array 33.94
number of element:36, list 37.44, tuple 37.00, np.array 34.89, arr.array 33.67
number of element:41, list 37.27, tuple 36.88, np.array 34.54, arr.array 33.46
number of element:46, list 37.13, tuple 36.78, np.array 34.26, arr.array 33.30
number of element:51, list 37.02, tuple 36.71, np.array 34.04, arr.array 33.18
number of element:56, list 36.93, tuple 36.64, np.array 33.86, arr.array 33.07
number of element:61, list 36.85, tuple 36.59, np.array 33.70, arr.array 32.98
number of element:66, list 36.79, tuple 36.55, np.array 33.58, arr.array 32.91
number of element:71, list 36.73, tuple 36.51, np.array 33.46, arr.array 32.85
number of element:76, list 36.68, tuple 36.47, np.array 33.37, arr.array 32.79
number of element:81, list 36.64, tuple 36.44, np.array 33.28, arr.array 32.74
number of element:86, list 36.60, tuple 36.42, np.array 33.21, arr.array 32.70
number of element:91, list 36.57, tuple 36.40, np.array 33.14, arr.array 32.66
number of element:96, list 36.54, tuple 36.38, np.array 33.08, arr.array 32.62

`L` means unsigned C long, not 4-byte unsigned integer. You were most likely on a platform where that's 8 bytes. Also, your list and tuple size tests only account for the memory consumption of the list or tuple object itself, not the memory consumption of the elements. — user2357112, Jun 21 '22 at 04:05
The documentation says that its 4 bytes. https://docs.python.org/3/library/array.html I am using ubuntu which is typically used for servers. — Alok Nayak, Jun 22 '22 at 05:16
It says the *minimum* size is 4 bytes. Ubuntu is one of the platforms where it's 8 bytes. — user2357112, Jun 22 '22 at 05:34
Changed 'L' to 'I', whose itemsize was coming to be 4 in my machine and updated sizeof function — Alok Nayak, Jun 22 '22 at 06:21

array.array versus numpy.array

3 Answers3

Linked

Related