4

I am trying to download an original image (png format) by url, convert it on the fly (without saving to disc) and save as jpg.

The code is following:

import os
import io
import requests
from PIL import Image
...
r = requests.get(img_url, stream=True)
if r.status_code == 200:
    i = Image.open(io.BytesIO(r.content))
    i.save(os.path.join(out_dir, 'image.jpg'), quality=85)

It works, but when I try to monitor the download progress (for the future progress bar) with r.iter_content() like this:

r = requests.get(img_url, stream=True)
if r.status_code == 200:
    for chunk in r.iter_content():
        print(len(chunk))
    i = Image.open(io.BytesIO(r.content))
    i.save(os.path.join(out_dir, 'image.jpg'), quality=85)

I get this error:

Traceback (most recent call last):
  File "E:/GitHub/geoportal/quicklookScrape/temp.py", line 37, in <module>
    i = Image.open(io.BytesIO(r.content))
  File "C:\Python35\lib\site-packages\requests\models.py", line 736, in content
    'The content for this response was already consumed')
RuntimeError: The content for this response was already consumed

So is it possible to monitor the download progress and after get the data itself?

Vasily
  • 2,192
  • 4
  • 22
  • 33

1 Answers1

4

When using r.iter_content(), you need to buffer the results somewhere. Unfortunately, I can't find any examples where the contents get appended to an object in memory--usually, iter_content is used when a file can't or shouldn't be loaded entirely in memory at once. However, you buffer it using a tempfile.SpooledTemporaryFile as described in this answer: https://stackoverflow.com/a/18550652/4527093. This will prevent saving the image to disk (unless the image is larger than the specified max_size). Then, you can create the Image from the tempfile.

import os
import io
import requests
from PIL import Image
import tempfile

buffer = tempfile.SpooledTemporaryFile(max_size=1e9)
r = requests.get(img_url, stream=True)
if r.status_code == 200:
    downloaded = 0
    filesize = int(r.headers['content-length'])
    for chunk in r.iter_content(chunk_size=1024):
        downloaded += len(chunk)
        buffer.write(chunk)
        print(downloaded/filesize)
    buffer.seek(0)
    i = Image.open(io.BytesIO(buffer.read()))
    i.save(os.path.join(out_dir, 'image.jpg'), quality=85)
buffer.close()

Edited to include chunk_size, which will limit the updates to occurring every 1kb instead of every byte.

dbc
  • 677
  • 8
  • 21
  • Thank you very much, dbc! And what if I just use an ordinary TemporaryFile? `with TemporaryFile() as tempf:`, write `chunks` to it and then read it with `i = Image.open(tempf)`? Won't it be easier? – Vasily Jun 10 '16 at 15:59
  • 2
    That would work, but using `TemporaryFile` will actually write the bytes to disk as they come in. Using `SpooledTemporaryFile` will keep the bytes in memory, and therefore might be faster--and it's what you specified in your question. :) – dbc Jun 10 '16 at 16:24
  • 1
    Note that the default `chunk_size` is 1 byte, so this will print a *lot* and slow progress. You can specify `r.iter_content(chunk_size=1024)` for example. – Nick P Jan 23 '21 at 18:18