I am using the arrays modules to store sizable numbers (many gigabytes) of unsigned 32 bit ints. Rather than using 4 bytes for each element, python is using 8 bytes, as indicated by array.itemsize, and verified by pympler.
eg:
>>> array("L", range(10)).itemsize
8
I have a large number of elements, so I would benefit from storing them within 4 bytes.
Numpy will let me store the values as unsigned 32 bit ints:
>>> np.array(range(10), dtype = np.uint32).itemsize
4
But the problem is that any operation using numpy's index operator is about twice as slow, so operations that aren't vector operations supported by numpy are slow. eg:
python3 -m timeit -s "from array import array; a = array('L', range(1000))" "for i in range(len(a)): a[i]"
10000 loops, best of 3: 51.4 usec per loop
vs
python3 -m timeit -s "import numpy as np; a = np.array(range(1000), dtype = np.uint32)" "for i in range(len(a)): a[i]"
10000 loops, best of 3: 90.4 usec per loop
So I am forced to either use twice as much memory as I would like, or the program will run twice as slow as I would like. Is there a way around this? Can I force python arrays to have use specified itemsize?