8

Edit: This question has been marked a duplicate? My question is clearly about optimising this process, not HOW to do it. I even provided code to prove that I had already figured out the latter. Do you internet hall monitors even read these questions past the title before you flag them?

I have the following block of code to compress an image using PIL, until said image is under a certain size.

from PIL import Image
import os

def compress(image_file, max_size, scale):
    while os.path.getsize(image_file) > max_size:
        pic = Image.open(image_file)
        original_size = pic.size
        pic = pic.resize((int(original_size[0] * scale),
            int(original_size[1] * scale)),
            Image.ANTIALIAS)
        pic.save(image_file, optimize=True, quality=95)

In this code, I use os.path.getsize(image_file) to get the size of the image. However, this means that the file must be saved in pic.save(image_file, optimize=True, quality=95 every time the loop runs.

That process takes a long time.

Is there a way to optimise this by somehow getting the size of the image in the PIL Image object pic?

ning
  • 1,823
  • 1
  • 19
  • 25
  • It's not clear what you are trying to do. The loop resizes the image, so saving is necessary. If all you want is size information then don't resize or save the file. – aris Nov 14 '16 at 11:23
  • 1
    I assume that you are saving as JPEG. You can save some time by saving the file data to a BytesIO object in memory instead of to disk. That will also make it faster to get the resulting file size. However, it won't speed up the encoding process. BTW, there's not much point using quality 95. It's very slow, it produces large file sizes, and the visual difference between 90 & 95 is rarely noticeable. And 85 is often quite adequate, depending on the nature of the image. – PM 2Ring Nov 14 '16 at 11:36
  • 1
    I should also mention that the image scaling routines in PIL / Pillow are not very high quality, although you mightn't notice that if the image is large enough and is a photograph with lots of smooth tone transitions rather than a computer-generated image with lots of zones of high contrast. Also you should **not** progressively edit a JPEG. Don't save a scaled image, then reload it, and rescale the already-scaled image. You will lose quality very quickly that way. If you must try different scales until the file size is small enough generate each new version from the original. – PM 2Ring Nov 14 '16 at 11:43

2 Answers2

8

Use io.BytesIO() to save the image into memory. It is also probably better to resize from your original file each time as follows:

from PIL import Image
import os
import io

def compress(original_file, max_size, scale):
    assert(0.0 < scale < 1.0)
    orig_image = Image.open(original_file)
    cur_size = orig_image.size

    while True:
        cur_size = (int(cur_size[0] * scale), int(cur_size[1] * scale))
        resized_file = orig_image.resize(cur_size, Image.ANTIALIAS)

        with io.BytesIO() as file_bytes:
            resized_file.save(file_bytes, optimize=True, quality=95, format='jpeg')

            if file_bytes.tell() <= max_size:
                file_bytes.seek(0, 0)
                with open(original_file, 'wb') as f_output:
                    f_output.write(file_bytes.read())
                break

compress(r"c:\mytest.jpg", 10240, 0.9) 

So this will take the file and scale it down 0.9 each attempt until a suitable size is reached. It then overwrites the original file.


As an alternative approach, you could create a list of scales to try, e.g. [0.01, 0.02 .... 0.99, 1] and then use a binary chop to determine which scale results in a filesize closest to max_size as follows:

def compress(original_file, max_size):
    save_opts={'optimize':True, 'quality':95, 'format':'jpeg'}
    orig_image = Image.open(original_file)
    width, height = orig_image.size
    scales = [scale / 1000 for scale in range(1, 1001)]  # e.g. [0.001, 0.002 ... 1.0]

    lo = 0
    hi = len(scales)

    while lo < hi:
        mid = (lo + hi) // 2

        scaled_size = (int(width * scales[mid]), int(height * scales[mid]))
        resized_file = orig_image.resize(scaled_size, Image.ANTIALIAS)

        file_bytes = io.BytesIO()
        resized_file.save(file_bytes, **save_opts)
        size = file_bytes.tell()
        print(size, scales[mid])

        if size < max_size: 
            lo = mid + 1
        else: 
            hi = mid

    scale = scales[max(0, lo-1)]
    print("Using scale:", scale)
    orig_image.resize((int(width * scale), int(height * scale)), Image.ANTIALIAS).save(original_file, **save_opts)

So for a max_size of 10000, the loop first tries a scale of 0.501, if too big 0.251 is tried and so on. When max_size=1024 the following scales would be tried:

180287 0.501
56945 0.251
17751 0.126
5371 0.063
10584 0.095
7690 0.079
9018 0.087
10140 0.091
9336 0.089
9948 0.09
Using scale: 0.09
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
  • One should definitely check for scale >0.0 and <1.0 because of diving into an endless loop or memory eating monster... :) – ferdy Nov 14 '16 at 11:59
0

Possibly, you might have a look at this answer using StringIO to perform the operations in memory.

import StringIO

output = StringIO.StringIO() 
image.save(output) 
contents = output.getvalue() 
output.close()

image.save(output, format="GIF")
Community
  • 1
  • 1
ferdy
  • 7,366
  • 3
  • 35
  • 46
  • 2
    That answer is rather old. `StringIO` is fine for Python 2, but in Python 3 you need to use `io.BytesIO()`. – PM 2Ring Nov 14 '16 at 11:45
  • Okay, the so possibly that's the way to go in Py3. But the principle is the same: Use a variable in RAM, perform your operations on it and then save the image back in a file. – ferdy Nov 14 '16 at 11:48
  • Certainly. But the OP is looking for a Python 3 solution, and in Python 3 `import StringIO` raises `ImportError: No module named 'StringIO'`. And of course you can't put bytes from an image into a Python 3 string, you need to put them in a bytes or bytearray object. – PM 2Ring Nov 14 '16 at 11:52
  • If op is really relying on python 3 then PIL is not the right solution anyway. He then wants to use `pip install Pillow` which btw is providing python 2 and 3 support afaik. – ferdy Nov 14 '16 at 11:58
  • The OP has the `python-3.x` tag on the question, so they must be using the Pillow fork, since there isn't a Python 3 version of the original PIL (which hasn't been maintained for several years now). And yes, Pillow works as a drop-in replacement for PIL. So you still do `from PIL import Image`, etc. FWIW, the current SO Python policy is to assume Python 3 unless the question explicitly mentions Python 2. – PM 2Ring Nov 14 '16 at 12:36