1

I have a huge dataset of images with resolution 1280*1918 in .jpg format. One image on disk weights < 100 kB but when I load it with skimage.io it takes more than 7 MB. So I decided to try and store it in sparse format. But this gives very unexpected results.

So I have the following code:

from skimage.io import imread
from scipy import sparse

suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
def humansize(nbytes):
    i = 0
    while nbytes >= 1024 and i < len(suffixes)-1:
        nbytes /= 1024.
        i += 1
    f = ('%.2f' % nbytes).rstrip('0').rstrip('.')
    return '%s %s' % (f, suffixes[i])

first_image = imread("imgname.jpg")
humansize(sys.getsizeof(first_image))
>>> 7.02 MB

Then I got the first sparse algorithm and tried to convert it to sparse (because it's a tensor I took only one channel):

sps = sparse.bsr_matrix(first_image[:,:,0])
humansize(sys.getsizeof(sys))
>>> 80 B

I do understand that it's just a third part of an image but 80*3 is far less then I expected. Can this size be real? Are there any other methods to store images and be more effective in terms of memory? My goal is to store the closest to 5000 in 8GB of RAM.

UpmostScarab
  • 960
  • 10
  • 29
  • 2
    JPEG images are compressed. This is why they are very small. If you load such an image it is uncompressed and the raw pixel data in memory is much bigger. A sparse representation will only help if the image contains almost nothing (i.e. lots of 0-valued black pixels). If that is not the case you could keep the raw file content in memory and uncompress individual images when needed. Note that the size reported of a BSR matrix is always the same, even for an empty matrix. – MB-F Aug 29 '17 at 07:21
  • @kazemakase I do understand that there's a compression. The idea with storing raw files is actually good, thank you. I believe imread can handle decompressing on the fly, not from file. In my case there's more white in the picture so it's more surprising that the size is low. I would like to figure out if I'm measuring it incorrectly. – UpmostScarab Aug 29 '17 at 07:32
  • 1
    You only measure the size of the sparse matrix object but not of the data it contains. See [here](https://stackoverflow.com/a/11173074/3005167) for some more details. – MB-F Aug 29 '17 at 07:37
  • @kazemakase I shall test that – UpmostScarab Aug 29 '17 at 07:42
  • @kazemakase so I tested that and it is 4.7 MB. Thank you. You probably should submit an answer. – UpmostScarab Aug 29 '17 at 09:01
  • Good to hear it worked out. I'm rather busy at the moment.. guess I can draft up an answer later today if nobody beats me to it ;) – MB-F Aug 29 '17 at 11:13
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/153170/discussion-between-upmostscarab-and-kazemakase). – UpmostScarab Aug 29 '17 at 19:59

0 Answers0