3

Using the idea from this code for changing image size dynamically in a loop, there's bit of a problem here. There are a couple of methods to get the image size in bytes only one gives the accurate results but that requires file to be saved in the disk. If I save the disk every time and read it again, it'll take double the effort per iteration. IS there any way to read the image results accurately?

from PIL import Image
import os
import sys

image = Image.open(image_path
size_kb = os.stat(image_path).st_size
buffer = BytesIO()
image.save(buffer, format="jpeg", quality = 100, optimize = True) # Does not save but acts like an image saved to disc
size_kb2 = (buffer.getbuffer().nbytes)

printing the 3 different results print(size_kb, size_kb2, sys.getsizeof(image.tobytes()),) gives me 3 different results for the same image where os.stat gives accurate results (same results as shown by the Linux OS)

I do not want to save the image to disc to read it again because it'll take a whole lot of time

whole Code:

STEP = 32
MIN_SIZE = 32

def resize_under_kb(image:Image,size_kb: float, desired_size:float)-> Image:
    '''
    Resize the image under given size in KB
    args:
        Image: Pil Image object
        size_kb: Current size of image in kb
        desired_size: Final desired size asked by user
    '''
    size = image.size
    new_width_height = max(size) - STEP # Decrease the pixels for first pass

    while new_width_height > MIN_SIZE and size_kb > desired_size: # either the image reaches minimun dimension possible or the desired possible size
        image = image.resize((new_width_height,new_width_height))  # keep on resizing until you get to desired output

        buffer = BytesIO()
        image.save(buffer, format="jpeg", quality = 100, optimize = True) # Does not save but acts like an image saved to disc
        size_kb = buffer.getbuffer().nbytes

        size = image.size # Current resized pixels
        new_width_height = max(size) - STEP # Dimensions for next iteration

    return image
Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
Deshwal
  • 3,436
  • 4
  • 35
  • 94
  • Accurately i think not. When writing to disk there can be small differences based on your filesystem and image to bytes might not include any metadata. On a sidenote resizing the image multiple times will cause a loss of quality, better to keep the original and do one big resize. – Eumel Apr 21 '22 at 07:21
  • please look up the purpose of `sys.getsizeof`. that's merely telling you the amount of RAM used by a specific *object* – Christoph Rackwitz Apr 21 '22 at 07:47
  • @ChristophRackwitz yes, But isn't it proportional to the size of image in a way? I mean `number of pixels * memory taken by each bit`? Just saying – Deshwal Apr 21 '22 at 07:57
  • 2
    Saving JPEGs with `quality=100` is unlikely to be sensible if trying to reduce the size of an image! Please be clearer about your actual intention. What are you really trying to do with what type of images and what type of pixel dimensions and what type of sizes in bytes? – Mark Setchell Apr 21 '22 at 08:01
  • @Eumel Is there a way to approximate the size based on resized value **keeping the aspect ratio**. For someone having 0 knowledge of image processing and wanting to keep the image size under a specific size, they can't know. Just in case you're thinking if this is even a problem, try applying of any exam in India. Each application has their own limitations that an image should be between this size. Atleast 10M such forms are filled every year. – Deshwal Apr 21 '22 at 08:02
  • Measuring the size of a JPEG on disk and comparing it to its size in RAM is misguided. Measuring the size of a JPEG on disk and expecting to write it back to disk at the same size is also misguided. – Mark Setchell Apr 21 '22 at 08:03
  • @MarkSetchell Hey Mark! I actually got your point. I know that keeping quality to 100 won't do any good. I'm just learning. I'll keep that thing in mind. Just exploring ways to reduce the size. But the original question here is can we get the actual image size without saving? Can you please explain how **saving to disc** is also misguided? what could be the final solution then? – Deshwal Apr 21 '22 at 08:04
  • It should be `buffer.getbuffer().nbytes` but you didn't share the output of your code, so nobody except you knows what you got, what you expected or why you think it's wrong. – Mark Setchell Apr 21 '22 at 08:07
  • I didn't say saving it to disk is misguided. I said saving it to disk *"and expecting it to have the same size as you started with"* was misguided. – Mark Setchell Apr 21 '22 at 08:10
  • @MarkSetchell if you open any image, you'll get 3 different results for sure. [Please take a look at the this answer](https://stackoverflow.com/questions/29319858/how-to-get-image-size-in-kb-while-using-pillow-python-before-storing-to-disk) – Deshwal Apr 21 '22 at 08:12
  • @Deshwal That answer is misguided, since it's decoding a JPEG (a lossy, compressed format) from disk, encoding that image as a PNG (a lossless optionally compressed format), and trying to compare the two. – AKX Apr 21 '22 at 08:23

1 Answers1

4

This code:

size_kb = os.stat(image_path).st_size

prints the number of bytes an existing JPEG takes on disk.


This code:

buffer = BytesIO()
image.save(buffer, format="jpeg", quality = 100, optimize = True) # Does not save but acts like an image saved to disc
size_kb2 = (buffer.getbuffer().nbytes)

prints the number of bytes an image would take on disk if saved... by PIL's current JPEG encoder, with its own Huffman tables and quality and chroma-subsampling and without allowing for file-system minimum block sizes.

This could be vastly different from the size you read from disk originally because that might have been created by different software, with different tradeoffs of speed and quality. It could even differ between two versions of PIL.


This code:

len(image.tobytes())

tells you the number of bytes your image is taking as currently decompressed in memory, without taking account of other data structures required for it and without taking account of metadata (comments, GPS data, copyright, manufacturer lens data and settings data).

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432