0

What is the way to store this array that takes up the least amount of memory? uint8 doesn't work since some values are negative and int8 doesn't work since some values are above 127. int16 works, but I would rather have it take up less space.

Should I not have it as a numpy array and just store it as a regular python list?

This is the array (i'm only including the first few lines, if you want the entire array let me know)

array([[[ 218,  219,  223],
        [   0,    0,    0],
        [   2,    2,    2],
        [   1,    1,    1],
        [   0,    0,    0],
        [   0,    0,    0],
        [   0,    0,    0],
        [   0,    0,    0],
        [  -3,   -3,   -3],
        [  -1,   -1,   -1],
        [   0,    0,    0],
        [  -1,   -1,   -1],
        [   0,    0,    0]]], dtype=int16)
  • One strategy would be to store the absolute values as a UINT8 array and the sign values as a separate boolean array, say 0 for negatives and 1 for positives. Boolean would have comparatively much lesser footprint than ints. – Divakar Dec 12 '19 at 17:06
  • 2
    In NumPy, the boolean data type requires one byte. To actually save memory by storing the signs of multiple numbers in a single byte, you'll have to write code that does the appropriate bit-twiddling. How big is your actual array? How much memory are you trying to save? – Warren Weckesser Dec 12 '19 at 17:17
  • `store` - in working memory, or on a file of some sort? – hpaulj Dec 12 '19 at 17:20
  • @Divakar sorry, I interpreted it differently – Den Fula Ankungen Dec 12 '19 at 17:21
  • @hpaulj like a file, say I want to make a compressed image file – Den Fula Ankungen Dec 12 '19 at 17:21
  • @WarrenWeckesser thanks – Den Fula Ankungen Dec 12 '19 at 17:32
  • It depends a bit on the actual usecase. But using a fast compression/decompression algorithm like blosc and int16 would be one of the first things I would try. https://stackoverflow.com/a/56761075/4045774 What is the maximum size of the int16 array? The entire array would be good for testing. Of course you don't have to write the data to disk, but you can also use in memory compression. – max9111 Dec 13 '19 at 12:52
  • Is it important to be able to differentiate between 3 and 4 in the data? If not, you could just divide your data by 2 and use an `int8`. – Mark Setchell Dec 13 '19 at 21:33

1 Answers1

0

I tried

import numpy as np
a = np.array([[[ 218,  219,  223],
        [   0,    0,    0],
        [   2,    2,    2],
        [   1,    1,    1],
        [   0,    0,    0],
        [   0,    0,    0],
        [   0,    0,    0],
        [   0,    0,    0],
        [  -3,   -3,   -3],
        [  -1,   -1,   -1],
        [   0,    0,    0],
        [  -1,   -1,   -1],
        [   0,    0,    0]]], dtype=np.int16)
signs = a<0.astype(np.bool)
a=a.astype(np.uint8)

But found it is actually worse than the original(206 bytes) vs. 167 for signs and 167 for the uint8 array.
The boolean seems to be taking as much as the uint8 - after looking into it I found that packbits will finally get you somewhere -

signs2 = np.packbits(signs, axis=None)

although at 106 bytes for the packed bytes, the uint8+signs still loses out to the original int16. Those sizes are as reported by sys.getsizeof(); if you use len(x.tostring()) you will find 78 bytes for the original array, 39 for the unsigned 8bit array, 39 for the boolean signs, and 5 for the packed signs.

jeremy_rutman
  • 3,552
  • 4
  • 28
  • 47