Can I force python array elements to have a specific size?

Question

I am using the arrays modules to store sizable numbers (many gigabytes) of unsigned 32 bit ints. Rather than using 4 bytes for each element, python is using 8 bytes, as indicated by array.itemsize, and verified by pympler.

eg:

>>> array("L", range(10)).itemsize
8

I have a large number of elements, so I would benefit from storing them within 4 bytes.

Numpy will let me store the values as unsigned 32 bit ints:

>>> np.array(range(10), dtype = np.uint32).itemsize
4

But the problem is that any operation using numpy's index operator is about twice as slow, so operations that aren't vector operations supported by numpy are slow. eg:

python3 -m timeit -s "from array import array; a = array('L', range(1000))" "for i in range(len(a)): a[i]"
10000 loops, best of 3: 51.4 usec per loop

vs

python3 -m timeit -s "import numpy as np; a = np.array(range(1000), dtype = np.uint32)" "for i in range(len(a)): a[i]"
10000 loops, best of 3: 90.4 usec per loop

So I am forced to either use twice as much memory as I would like, or the program will run twice as slow as I would like. Is there a way around this? Can I force python arrays to have use specified itemsize?

It is a false dichotomy: your program can both use less memory and be faster. Though it is unrelated to the question whether you could use a fixed sized `array` items on different platforms (`array` probably uses C type sizes native to the platform). It is a separated question: how to make specific numpy-based calculations faster. — jfs, Apr 25 '16 at 06:12

score 5 · Accepted Answer · answered Apr 25 '16 at 06:18

If you want to stick to using array, set the typecode to I (unsigned int) rather than L (unsigned long):

>>> array.array("I", range(10)).itemsize
4

That said, I would be very surprised if there wasn't a way to speed up your calculations way more than the 2x you are losing by using numpy. Hard to tell without knowing exactly what are you doing with those values.

Back2Basics · Answer 2 · 2016-04-25T06:15:24.673

2 things: use numpy.arange() (an internal method)

and avoid using for statements with Numpy (for computer speed reasons). Try using broadcasting techniques as much as possible.

An easier way to get all the items is by using .ravel() if the shape of the numpy array is getting in the way.

python3 -m timeit -s "import numpy as np; a = np.arange(1000), 
....... dtype = np.uint32)" "for i in range(len(a)): a[i]"

10000 loops, best of 3: 106 usec per loop

python3 -m timeit -s "import numpy as np; a = np.arange(1000), 
....... dtype = np.uint32)" "a.ravel()"

1000000 loops, best of 3: 0.218 usec per loop

score 1 · Answer 3 · edited May 23 '17 at 12:34

As viewed here, array.array is an old tool that has no particular interest (to my knowledge).

If you have efficiency issues, numpy.array is a best choice than array.array, it is provided with a lot of optimized vectorized tools. In this case, 32bits operations are often quicker than 64bits ones, even on a 64 bits system :

import numpy as np
In [528]: %timeit np.sum(a32)
1000 loops, best of 3: 1.86 ms per loop

In [529]: %timeit np.sum(a64)
100 loops, best of 3: 2.22 ms per loop

In [530]: %timeit sum(a32)
1 loop, best of 3: 572 ms per loop

In [531]: %timeit sum(a64)
1 loop, best of 3: 604 ms per loop

As you see, you must avoid pure python loops, which are slower.

Can I force python array elements to have a specific size?

3 Answers3