Reading image from url takes too long

Question

Here’s the code I used:

import requests
from PIL import Image
import io
import cv2
response = requests.get(df1.URL[0]).content

im = Image.open(io.BytesIO(response))

The image is very large. Is there a way to fasten things? EDIT: I don't want to save image on disk. I just want to read it on the fly.

@MarkSetchell df1.URL is a column of the dataframe that contains the url of the image. Every row contains one url. But I'm going to do this for all rows. — user, Jan 02 '20 at 10:29
Does this answer your question? [How do I read image data from a URL in Python?](https://stackoverflow.com/questions/7391945/how-do-i-read-image-data-from-a-url-in-python) — αԋɱҽԃ αмєяιcαη, Jan 02 '20 at 10:37
If you have lots of images to download, the answer is different, but your question title and code both imply you only have one image! You should look at multithreading to get lots of I/O done in parallel. — Mark Setchell, Jan 02 '20 at 10:48

αԋɱҽԃ αмєяιcαη · Answer 1 · 2020-01-02T10:57:31.047

2

Well, to have this question clear as for now. Could you please check the timing between your code and the code below on a single image. and let us know the difference.

In case if you looking to deal with multiple images, so you need threading etc.. concurrent.futures

import requests

r = requests.get(url)

with open("out.jpg", 'wb') as f:
    f.write(r.content)

also kindly set stream=True and give it a try

import requests
from PIL import Image
import io
import cv2
response = requests.get(df1.URL[0],stream=True).content

im = Image.open(io.BytesIO(response))

edited Jan 02 '20 at 10:57

answered Jan 02 '20 at 09:49

αԋɱҽԃ αмєяιcαη

11,825
3
17
50

Why will this read the image faster? It appears to use the same `requests.get()` as the OP. – Mark Setchell Jan 02 '20 at 09:54
@MarkSetchell the point here is `PIL` check https://stackoverflow.com/questions/59495998/unable-to-download-image-using-request-in-python-url-send-me-html-in-respons/59496067#59496067 – αԋɱҽԃ αмєяιcαη Jan 02 '20 at 10:15
@αԋɱҽԃαмєяιcαη Is there a way to not save image? I don't have much disk space on google colab. So this will be a problem going forwards – user Jan 02 '20 at 10:25
@bookfreak what's the size of `pic` is ? – αԋɱҽԃ αмєяιcαη Jan 02 '20 at 10:32
@ αԋɱҽԃαмєяιcαη. It varies but from 3 Mo and up. And there are 5000 of them. Each url fetches one image. But I mean overall I'll fetch 5000 images. – user Jan 02 '20 at 10:34
@bookfreak check my updated answer and let me know. – αԋɱҽԃ αмєяιcαη Jan 02 '20 at 10:34
@αԋɱҽԃαмєяιcαη that's my initial code. It's the one that loads for a very long time. Your initial answer does the job quickly but has the problem of storage. – user Jan 02 '20 at 10:37
@bookfreak pay attention that i used `StringIO` – αԋɱҽԃ αмєяιcαη Jan 02 '20 at 10:38
@ αԋɱҽԃαмєяιcαη Sorry didn't see it. It gives TypeError: initial_value must be str or None, not bytes. – user Jan 02 '20 at 10:44
I have read the link you provided but I still fail to see why or how `response = requests.get(df1.URL[0]).content` can run at a different speed from `r = requests.get(url)` – Mark Setchell Jan 02 '20 at 10:50
@MarkSetchell I'm not about `requests.get`, I just telling you that `PIL` causing a slowness while opening the image on the fly, that's why i commented for you with `the point here is PIL`. Indeed `requests.get(df1.URL[0]).content` is equal to `r = requests.get(url)` but I'm about the operation which is done after the requests. there's big difference between downloading the image, and streaming the image on the fly. – αԋɱҽԃ αмєяιcαη Jan 02 '20 at 10:53
So, if the difference is not in `requests.get()`, are you saying that `img = Image.open(StringIO(response.content))` is significantly faster than `im = Image.open(io.BytesIO(response))` ? – Mark Setchell Jan 02 '20 at 10:57
@MarkSetchell Sure not !!! I'm saying that downloading the image and then opening it is pretty faster than streaming it while on the fly !! the point of `StringIO` is for checking different thing from `bytes` to `string` – αԋɱҽԃ αмєяιcαη Jan 02 '20 at 10:58
I still don't understand. OP wrote `requests.get()` followed by `Image.open()` and you suggested `requests.get()` followed by `Image.open()` and you say neither of your 2 lines are faster than the OP's two lines, so how can your code be faster? – Mark Setchell Jan 02 '20 at 11:01
well you are missing a part, again i repeat. `f.write(r.content)` is different than `im = Image.open(io.BytesIO(response))` and regarding `Image.open()` after i suggested `stream=True` – αԋɱҽԃ αмєяιcαη Jan 02 '20 at 11:02
`f.write()` is about saving an image. OP was asking about *"Reading from a URL"*, and specifically says in the comments that he wants to avoid saving it! – Mark Setchell Jan 02 '20 at 11:05
@MarkSetchell I did not look to the title, as i were working on the content. https://stackoverflow.com/posts/59561093/revisions as `EDIT: I don't want to save image on disk. I just want to read it on the fly.` was an EDIT – αԋɱҽԃ αмєяιcαη Jan 02 '20 at 11:06

user · Answer 2 · 2020-01-02T12:59:29.110

0

Thank you @αԋɱҽԃαмєяιcαη for your help. I made a comparison between duration of loading for different methods. Here's a link to see results comparison

edited Jan 02 '20 at 12:59

answered Jan 02 '20 at 12:53

user

93
10

kindly be informed that `answer` section is not for replying me back. please edit your question and include the comparison details. and you can delete this answer by clicking on `delete` under the answer. – αԋɱҽԃ αмєяιcαη Jan 02 '20 at 15:32

Reading image from url takes too long

2 Answers2