I have a huge dataset of images with resolution 1280*1918 in .jpg format. One image on disk weights < 100 kB but when I load it with skimage.io
it takes more than 7 MB. So I decided to try and store it in sparse format. But this gives very unexpected results.
So I have the following code:
from skimage.io import imread
from scipy import sparse
suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
def humansize(nbytes):
i = 0
while nbytes >= 1024 and i < len(suffixes)-1:
nbytes /= 1024.
i += 1
f = ('%.2f' % nbytes).rstrip('0').rstrip('.')
return '%s %s' % (f, suffixes[i])
first_image = imread("imgname.jpg")
humansize(sys.getsizeof(first_image))
>>> 7.02 MB
Then I got the first sparse algorithm and tried to convert it to sparse (because it's a tensor I took only one channel):
sps = sparse.bsr_matrix(first_image[:,:,0])
humansize(sys.getsizeof(sys))
>>> 80 B
I do understand that it's just a third part of an image but 80*3 is far less then I expected. Can this size be real? Are there any other methods to store images and be more effective in terms of memory? My goal is to store the closest to 5000 in 8GB of RAM.