How to calculate the resulting filesize of Image.resize() in PIL

Question

I have to reduce incoming files to a size of max 1MB. I use PIL for image operations and python 3.5. The filesize of an image is given by:

import os
src = 'testfile.jpg'
os.path.getsize(src)
print(src)

which gives in my case 1531494 If I open the file with PIL I can get only the dimensions:

from PIL import Image
src = 'testfile.jpg'
image = Image.open(src)
size =  image.size
print(size)

which gives in my case (1654, 3968)

Of course I can make loop over the file as below with different sizes, save the file and check its filesize. But there must be a simpler way, because this takes too much time. (If you downsize 1000 files with different sizes)

def resize_image(src, reduceby=1):
    '''
    resizes image by percent given in reduceby
    '''
    print(" process_image:",src, reduceby)
    org = Image.open(src)
    real_size = org.size
    reduced_size = (int(real_size[0] * reduceby / 100),int(real_size[1] * reduceby / 100) )
    org.resize(reduced_size, Image.ANTIALIAS)
    reduced_file = src[:-4] +"_" + str(reduceby) + src[-4:]
    org.save(reduced_file, optimize=True)
    print(" reduced_image:", reduced_file)
    reduced_filesize = os.path.getsize(reduced_file)
    return reduced_filesize, reduced_file

def loop_image(src, target_size):
    print("loop_image    :", src, target_size)
    file_size = os.path.getsize(src)
    reduced_file =src
    print("source        :", src, file_size)
    reduce_by = 1
    while file_size > target_size:
        file_size, reduced_file = resize_image(src, reduce_by)
        print("target       :", file_size, reduced_file)
        reduce_by += 1
    return reduced_file

This function works, but it reduces too much and takes too much time. My question is: How can I calculate the resulting filesize before I resize it? Or is there a simpler way?

Do you simply want to maintain the ratio of the file, but small enough for it to fit into 1 MB? — Thymen, Mar 03 '21 at 11:16
You can do it *"in-memory"* using an `io.BytesIO` like I do here https://stackoverflow.com/a/52281257/2836621 Obviously you would reduce the lengths of the sides rather than reducing the quality, but the principle is the same and the code uses a binary search to make it faster. — Mark Setchell, Mar 03 '21 at 13:44
@Thymen: Simply yes, I have to... Because the images are needed in a program which does upport only 1MB size. The incoming files have between 0.8 and 2.9 MB @Mark: This I havn't seen before, thanks. But I did already experiments with the quality option `im.save(buffer, format="JPEG", quality=m)`. I think resizing gives better results. — Papageno, Mar 04 '21 at 13:04

Thymen · Accepted Answer · 2021-03-03T15:01:22.633

Long story short, you do not know how well the image will be compressed, because it depends a lot on what kind of image it is. That said, we can optimize your code.

Some optimizations:

Approximate the number of bytes per pixel using the memory size and the image width.
performing a ratio updated based on the new memory consumption and old memory consumption.

My coding solution applies both of the above methods, because applying them separately didn't seem to result in very stable convergence. The following sections will explain both part in more depth and show the test cases that I considered.

Reducing image memory

The following code approximates the new image dimensions based on the difference between the original file size (in bytes) and the preferred file size (in bytes). It will approximate the number of bytes per pixels and then applies the difference between the original bytes per pixel and the preferred bytes per pixel on the image width and height (therefore the square root is taken).

Then I use opencv-python (cv2) for the image rescaling, but that can be changed by your code.

def reduce_image_memory(path, max_file_size: int = 2 ** 20):
    """
        Reduce the image memory by downscaling the image.

        :param path: (str) Path to the image
        :param max_file_size: (int) Maximum size of the file in bytes
        :return: (np.ndarray) downscaled version of the image
    """
    image = cv2.imread(path)
    height, width = image.shape[:2]

    original_memory = os.stat(path).st_size
    original_bytes_per_pixel = original_memory / np.product(image.shape[:2])

    # perform resizing calculation
    new_bytes_per_pixel = original_bytes_per_pixel * (max_file_size / original_memory)
    new_bytes_ratio = np.sqrt(new_bytes_per_pixel / original_bytes_per_pixel)
    new_width, new_height = int(new_bytes_ratio * width), int(new_bytes_ratio * height)

    new_image = cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_LINEAR_EXACT)
    return new_image

Applying ratio

Most of the magic happens in ratio *= max_file_size / new_memory, where we calculate our error with respect to the preferred size and correct our ratio with that value.

The program will search for a ratio that satisfies the following condition:

abs(1 - max_file_size / new_memory) > max_deviation_percentage

This means that the new file size has to be relatively close to the preferred file size. You control this closeness ratio by delta. The higher the delta is the, the smaller your file can be (be lower than max_file_size). The smaller the delta is the closer the new file size will be to the max_file_size, but it will never be larger.

The trade of is in time, the smaller delta is the more time it will take to find a ratio satisfying the condition, empirically testing shows that values between 0.01 and 0.05 are good.

if __name__ == '__main__':
    image_location = "test img.jpg"

    # delta denotes the maximum variation allowed around the max_file_size
    # The lower the delta the more time it takes, but the close it will be to `max_file_size`.
    delta = 0.01
    max_file_size = 2 ** 20 * (1 - delta)
    max_deviation_percentage = delta

    current_memory = new_memory = os.stat(image_location).st_size
    ratio = 1
    steps = 0

    # make sure that the comparison is within a certain deviation.
    while abs(1 - max_file_size / new_memory) > max_deviation_percentage:
        new_image = reduce_image_memory(image_location, max_file_size=max_file_size * ratio)
        cv2.imwrite(f"resize {image_location}", new_image)

        new_memory = os.stat(f"resize {image_location}").st_size
        ratio *= max_file_size / new_memory
        steps += 1

    print(f"Memory resize: {current_memory / 2 ** 20:5.2f}, {new_memory / 2 ** 20:6.4f} MB, number of steps {steps}")

Test cases

For testing I had two different approaches, using randomly generated images and an example from google.

For the random images I used the following code

def generate_test_image(ratio: Tuple[int, int], file_size: int) -> Image:
    """
        Generate a test image with fixed width height ratio and an approximate size.

        :param ratio: (Tuple[int, int]) screen ratio for the image
        :param file_size: (int) Approximate size of the image, note that this may be off due to image compression.
    """
    height, width = ratio  # Numpy reverse values
    scale = np.int(np.sqrt(file_size // (width * height)))
    img = np.random.randint(0, 255, (width * scale, height * scale, 3), dtype=np.uint8)
    return img

results

Using a randomly generated image

image_location = "test image random.jpg"
# Generate a large image with fixed ratio and a file size of ~1.7MB
image = generate_test_image(ratio=(16, 9), file_size=1531494)
cv2.imwrite(image_location, image)

Memory resize: 1.71, 0.99 MB, number of steps 2

In 2 steps it reduces the original size from 1.7 MB to 0.99 MB.

(before)

(after)

Using a google image

Memory resize: 1.51, 0.996 MB, number of steps 4

In 4 steps it reduces the original size from 1.51 MB to 0.996 MB.

(before)

(after)

Bonus

It also works for .png, .jpeg, .tiff, etc...
Besides downscaling it can also be used to upscale images to a certain memory consumption.
The image ratio is maintained as good as possible.

Edit

I made the code a bit more user friendly, and added the suggestion from Mark Setchell using the io.Buffer, this roughly speeds up the code with a factor of 2. There is also a step_limit, that prevents endless looping if the delta is very small.

import io
import os
import time
from typing import Tuple

import cv2
import numpy as np
from PIL import Image


def generate_test_image(ratio: Tuple[int, int], file_size: int) -> Image:
    """
        Generate a test image with fixed width height ratio and an approximate size.

        :param ratio: (Tuple[int, int]) screen ratio for the image
        :param file_size: (int) Approximate size of the image, note that this may be off due to image compression.
    """
    height, width = ratio  # Numpy reverse values
    scale = np.int(np.sqrt(file_size // (width * height)))
    img = np.random.randint(0, 255, (width * scale, height * scale, 3), dtype=np.uint8)
    return img


def _change_image_memory(path, file_size: int = 2 ** 20):
    """
        Tries to match the image memory to a specific file size.

        :param path: (str) Path to the image
        :param file_size: (int) Size of the file in bytes
        :return: (np.ndarray) rescaled version of the image
    """
    image = cv2.imread(path)
    height, width = image.shape[:2]

    original_memory = os.stat(path).st_size
    original_bytes_per_pixel = original_memory / np.product(image.shape[:2])

    # perform resizing calculation
    new_bytes_per_pixel = original_bytes_per_pixel * (file_size / original_memory)
    new_bytes_ratio = np.sqrt(new_bytes_per_pixel / original_bytes_per_pixel)
    new_width, new_height = int(new_bytes_ratio * width), int(new_bytes_ratio * height)

    new_image = cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_LINEAR_EXACT)
    return new_image


def _get_size_of_image(image):
    # Encode into memory and get size
    buffer = io.BytesIO()
    image = Image.fromarray(image)
    image.save(buffer, format="JPEG")
    size = buffer.getbuffer().nbytes
    return size


def limit_image_memory(path, max_file_size: int, delta: float = 0.05, step_limit=10):
    """
        Reduces an image to the required max file size.

        :param path: (str) Path to the original (unchanged) image.
        :param max_file_size: (int) maximum size of the image
        :param delta: (float) maximum allowed variation from the max file size.
            This is a value between 0 and 1, relatively to the max file size.
        :return: an image path to the limited image.
    """
    start_time = time.perf_counter()
    max_file_size = max_file_size * (1 - delta)
    max_deviation_percentage = delta
    new_image = None

    current_memory = new_memory = os.stat(image_location).st_size
    ratio = 1
    steps = 0

    while abs(1 - max_file_size / new_memory) > max_deviation_percentage:
        new_image = _change_image_memory(path, file_size=max_file_size * ratio)
        new_memory = _get_size_of_image(new_image)
        ratio *= max_file_size / new_memory
        steps += 1

        # prevent endless looping
        if steps > step_limit:  break

    print(f"Stats:"
          f"\n\t- Original memory size: {current_memory / 2 ** 20:9.2f} MB"
          f"\n\t- New memory size     : {new_memory / 2 ** 20:9.2f} MB"
          f"\n\t- Number of steps {steps}"
          f"\n\t- Time taken: {time.perf_counter() - start_time:5.3f} seconds")

    if new_image is not None:
        cv2.imwrite(f"resize {path}", new_image)
        return f"resize {path}"
    return path


if __name__ == '__main__':
    image_location = "your nice image.jpg"

    # Uncomment to generate random test images
    # test_image = generate_test_image(ratio=(16, 9), file_size=1567289)
    # cv2.imwrite(image_location, test_image)

    path = limit_image_memory(image_location, max_file_size=2 ** 20, delta=0.01)

You don't need to actually write the file to physical disk and then `stat()` it - you can write to memory using an `io.BytesIO` like in my comment above and then get the size of the memory buffer. You can also write the memory buffer directly to disk when you get the correct size without needing to JPEG encode it again. Both of those should make it faster, hopefully :-) — Mark Setchell, Mar 03 '21 at 14:44
They are only thoughts/suggestions - you already have my vote :-) — Mark Setchell, Mar 03 '21 at 14:49
I implemented that part, and the code is now around two times as fast, but I think the result will be more visible for larger images. — Thymen, Mar 03 '21 at 14:58
Thanks a lot for your explanations, great thing. Thats really a wonderful solution. — Papageno, Mar 04 '21 at 13:12