Long story short, you do not know how well the image will be compressed, because it depends a lot on what kind of image it is. That said, we can optimize your code.
Some optimizations:
- Approximate the number of bytes per pixel using the memory size and the image width.
- performing a ratio updated based on the new memory consumption and old memory consumption.
My coding solution applies both of the above methods, because applying them separately didn't seem to result in very stable convergence. The following sections will explain both part in more depth and show the test cases that I considered.
Reducing image memory
The following code approximates the new image dimensions based on the difference between the original file size (in bytes) and the preferred file size (in bytes). It will approximate the number of bytes per pixels and then applies the difference between the original bytes per pixel and the preferred bytes per pixel on the image width and height (therefore the square root is taken).
Then I use opencv-python
(cv2) for the image rescaling, but that can be changed by your code.
def reduce_image_memory(path, max_file_size: int = 2 ** 20):
"""
Reduce the image memory by downscaling the image.
:param path: (str) Path to the image
:param max_file_size: (int) Maximum size of the file in bytes
:return: (np.ndarray) downscaled version of the image
"""
image = cv2.imread(path)
height, width = image.shape[:2]
original_memory = os.stat(path).st_size
original_bytes_per_pixel = original_memory / np.product(image.shape[:2])
# perform resizing calculation
new_bytes_per_pixel = original_bytes_per_pixel * (max_file_size / original_memory)
new_bytes_ratio = np.sqrt(new_bytes_per_pixel / original_bytes_per_pixel)
new_width, new_height = int(new_bytes_ratio * width), int(new_bytes_ratio * height)
new_image = cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_LINEAR_EXACT)
return new_image
Applying ratio
Most of the magic happens in ratio *= max_file_size / new_memory
, where we calculate our error with respect to the preferred size and correct our ratio with that value.
The program will search for a ratio that satisfies the following condition:
abs(1 - max_file_size / new_memory) > max_deviation_percentage
This means that the new file size has to be relatively close to the preferred file size. You control this closeness ratio by delta
. The higher the delta is the, the smaller your file can be (be lower than max_file_size
). The smaller the delta is the closer the new file size will be to the max_file_size
, but it will never be larger.
The trade of is in time, the smaller delta is the more time it will take to find a ratio satisfying the condition, empirically testing shows that values between 0.01
and 0.05
are good.
if __name__ == '__main__':
image_location = "test img.jpg"
# delta denotes the maximum variation allowed around the max_file_size
# The lower the delta the more time it takes, but the close it will be to `max_file_size`.
delta = 0.01
max_file_size = 2 ** 20 * (1 - delta)
max_deviation_percentage = delta
current_memory = new_memory = os.stat(image_location).st_size
ratio = 1
steps = 0
# make sure that the comparison is within a certain deviation.
while abs(1 - max_file_size / new_memory) > max_deviation_percentage:
new_image = reduce_image_memory(image_location, max_file_size=max_file_size * ratio)
cv2.imwrite(f"resize {image_location}", new_image)
new_memory = os.stat(f"resize {image_location}").st_size
ratio *= max_file_size / new_memory
steps += 1
print(f"Memory resize: {current_memory / 2 ** 20:5.2f}, {new_memory / 2 ** 20:6.4f} MB, number of steps {steps}")
Test cases
For testing I had two different approaches, using randomly generated images and an example from google.
For the random images I used the following code
def generate_test_image(ratio: Tuple[int, int], file_size: int) -> Image:
"""
Generate a test image with fixed width height ratio and an approximate size.
:param ratio: (Tuple[int, int]) screen ratio for the image
:param file_size: (int) Approximate size of the image, note that this may be off due to image compression.
"""
height, width = ratio # Numpy reverse values
scale = np.int(np.sqrt(file_size // (width * height)))
img = np.random.randint(0, 255, (width * scale, height * scale, 3), dtype=np.uint8)
return img
results
- Using a randomly generated image
image_location = "test image random.jpg"
# Generate a large image with fixed ratio and a file size of ~1.7MB
image = generate_test_image(ratio=(16, 9), file_size=1531494)
cv2.imwrite(image_location, image)
Memory resize: 1.71, 0.99 MB, number of steps 2
In 2 steps it reduces the original size from 1.7 MB to 0.99 MB.
(before)

(after)

Memory resize: 1.51, 0.996 MB, number of steps 4
In 4 steps it reduces the original size from 1.51 MB to 0.996 MB.
(before)

(after)

Bonus
- It also works for
.png
, .jpeg
, .tiff
, etc...
- Besides downscaling it can also be used to upscale images to a certain memory consumption.
- The image ratio is maintained as good as possible.
Edit
I made the code a bit more user friendly, and added the suggestion from Mark Setchell
using the io.Buffer
, this roughly speeds up the code with a factor of 2. There is also a step_limit
, that prevents endless looping if the delta is very small.
import io
import os
import time
from typing import Tuple
import cv2
import numpy as np
from PIL import Image
def generate_test_image(ratio: Tuple[int, int], file_size: int) -> Image:
"""
Generate a test image with fixed width height ratio and an approximate size.
:param ratio: (Tuple[int, int]) screen ratio for the image
:param file_size: (int) Approximate size of the image, note that this may be off due to image compression.
"""
height, width = ratio # Numpy reverse values
scale = np.int(np.sqrt(file_size // (width * height)))
img = np.random.randint(0, 255, (width * scale, height * scale, 3), dtype=np.uint8)
return img
def _change_image_memory(path, file_size: int = 2 ** 20):
"""
Tries to match the image memory to a specific file size.
:param path: (str) Path to the image
:param file_size: (int) Size of the file in bytes
:return: (np.ndarray) rescaled version of the image
"""
image = cv2.imread(path)
height, width = image.shape[:2]
original_memory = os.stat(path).st_size
original_bytes_per_pixel = original_memory / np.product(image.shape[:2])
# perform resizing calculation
new_bytes_per_pixel = original_bytes_per_pixel * (file_size / original_memory)
new_bytes_ratio = np.sqrt(new_bytes_per_pixel / original_bytes_per_pixel)
new_width, new_height = int(new_bytes_ratio * width), int(new_bytes_ratio * height)
new_image = cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_LINEAR_EXACT)
return new_image
def _get_size_of_image(image):
# Encode into memory and get size
buffer = io.BytesIO()
image = Image.fromarray(image)
image.save(buffer, format="JPEG")
size = buffer.getbuffer().nbytes
return size
def limit_image_memory(path, max_file_size: int, delta: float = 0.05, step_limit=10):
"""
Reduces an image to the required max file size.
:param path: (str) Path to the original (unchanged) image.
:param max_file_size: (int) maximum size of the image
:param delta: (float) maximum allowed variation from the max file size.
This is a value between 0 and 1, relatively to the max file size.
:return: an image path to the limited image.
"""
start_time = time.perf_counter()
max_file_size = max_file_size * (1 - delta)
max_deviation_percentage = delta
new_image = None
current_memory = new_memory = os.stat(image_location).st_size
ratio = 1
steps = 0
while abs(1 - max_file_size / new_memory) > max_deviation_percentage:
new_image = _change_image_memory(path, file_size=max_file_size * ratio)
new_memory = _get_size_of_image(new_image)
ratio *= max_file_size / new_memory
steps += 1
# prevent endless looping
if steps > step_limit: break
print(f"Stats:"
f"\n\t- Original memory size: {current_memory / 2 ** 20:9.2f} MB"
f"\n\t- New memory size : {new_memory / 2 ** 20:9.2f} MB"
f"\n\t- Number of steps {steps}"
f"\n\t- Time taken: {time.perf_counter() - start_time:5.3f} seconds")
if new_image is not None:
cv2.imwrite(f"resize {path}", new_image)
return f"resize {path}"
return path
if __name__ == '__main__':
image_location = "your nice image.jpg"
# Uncomment to generate random test images
# test_image = generate_test_image(ratio=(16, 9), file_size=1567289)
# cv2.imwrite(image_location, test_image)
path = limit_image_memory(image_location, max_file_size=2 ** 20, delta=0.01)