How to download image using requests

Question

I'm trying to download and save an image from the web using python's requests module.

Here is the (working) code I used:

img = urllib2.urlopen(settings.STATICMAP_URL.format(**data))
with open(path, 'w') as f:
    f.write(img.read())

Here is the new (non-working) code using requests:

r = requests.get(settings.STATICMAP_URL.format(**data))
if r.status_code == 200:
    img = r.raw.read()
    with open(path, 'w') as f:
        f.write(img)

Can you help me on what attribute from the response to use from requests?

Does this answer your question? [Download large file in python with requests](https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests) — AMC, Jan 17 '20 at 03:04

Martijn Pieters · Accepted Answer · 2020-03-04T21:08:43.127

641

You can either use the response.raw file object, or iterate over the response.

To use the response.raw file-like object will not, by default, decode compressed responses (with GZIP or deflate). You can force it to decompress for you anyway by setting the decode_content attribute to True (requests sets it to False to control decoding itself). You can then use shutil.copyfileobj() to have Python stream the data to a file object:

import requests
import shutil

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)

To iterate over the response use a loop; iterating like this ensures that data is decompressed by this stage:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r:
            f.write(chunk)

This'll read the data in 128 byte chunks; if you feel another chunk size works better, use the Response.iter_content() method with a custom chunk size:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r.iter_content(1024):
            f.write(chunk)

Note that you need to open the destination file in binary mode to ensure python doesn't try and translate newlines for you. We also set stream=True so that requests doesn't download the whole image into memory first.

edited Mar 04 '20 at 21:08

answered Oct 30 '12 at 11:18

Martijn Pieters

1,048,767
296
4,058
3,343

2

With the help of your answer I could able to find data in text file, steps I used are `r2 = requests.post(r.url, data); print r2.content`. But now I also want to know `filename`. is their any cleaned way? -- presently I found file name in header -- `r2.headers['content-disposition']` that gives me output as: `'attachment; filename=DELS36532G290115.csi'` I am parsing this string for filename... is their any cleaner way? – Grijesh Chauhan Jan 29 '15 at 10:39
8

@GrijeshChauhan: yes, the `content-disposition` header is the way to go here; use [`cgi.parse_header()`](https://docs.python.org/2/library/cgi.html#cgi.parse_header) to parse it and get the parameters; `params = cgi.parse_header(r2.headers['content-disposition'])[1]` then `params['filename']`. – Martijn Pieters Jan 29 '15 at 10:41
1

To get the default 128 byte chunks, you need to [iterate over the `requests.Response` itself](https://github.com/kennethreitz/requests/blob/master/requests/models.py#L613): `for chunk in r: ...`. Calling `iter_content()` without a `chunk_size` will [iterate in 1 byte chunks](https://github.com/kennethreitz/requests/blob/master/requests/models.py#L642). – dtk Jun 02 '15 at 23:23
@dtk: thanks, I'll update the answer. Iteration [changed after I posted my answer](https://github.com/kennethreitz/requests/commit/2ac391373329b2d8c67d34fd7c056ff9db16a5f9). – Martijn Pieters Jun 25 '15 at 10:37
Wouldn't be much pertinent to use `r.ok` instead of `r.status_code == 200` to check for a valid http code – KumZ Nov 23 '16 at 14:08
2

@KumZ two reasons: `response.ok` was never documented, and it produces true for any 1xx, 2xx or 3xx status, but only a 200 response has a response body. – Martijn Pieters Nov 23 '16 at 19:31
@GrijeshChauhan: to support non-ascii filenames, you could use [`rfc6266-parser` module](https://pypi.python.org/pypi/rfc6266-parser/): `rfc6266.parse_requests_response(r).filename_unsafe` (it uses `url` if the filename can't be extracted from `Content-Disposition`). – jfs Apr 12 '17 at 16:53

Oleh Prypin · Answer 2 · 2013-08-13T21:24:48.017

294

Get a file-like object from the request and copy it to a file. This will also avoid reading the whole thing into memory at once.

import shutil

import requests

url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response

edited Aug 13 '13 at 21:24

answered Aug 04 '13 at 13:32

Oleh Prypin

33,184
10
89
99

19

Thank you so much for coming back and answering this. Though the other answer is works, this one is leaps and bounds simpler – dkroy Aug 06 '13 at 04:04
13

It's worth noting that few servers are set to GZIP their images because images already have their own compression. It's counterproductive, wastes CPU cycles with little benefit. So while this may be an issue with text content, specifically with images it's not. – phette23 Sep 11 '14 at 04:19
4

is there any way we can access the original filename – mahes Mar 06 '16 at 13:51
@phette23 It's also worth noting that Google PageSpeed reports and does that by default. – Wernight May 31 '16 at 13:33
15

Should set `r.raw.decode_content = True` before `shutil.copyfileobj(response.raw, out_file)` because `by default, decode compressed responses (with GZIP or deflate)`, so you will get a zero-file image. – Cloud Dec 29 '16 at 03:42
@SiminJie When you write `r.raw.decode_content` do you mean `response.raw.decode_content`? – Vasilis Apr 08 '20 at 13:34
Nice answer, but I'd add `if response == 200` before writing to file – Pedro Lobito Jun 24 '20 at 00:20

score 232 · Answer 3 · edited Nov 11 '16 at 01:25

232

How about this, a quick solution.

import requests

url = "http://craphound.com/images/1006884_2adf8fc7.jpg"
response = requests.get(url)
if response.status_code == 200:
    with open("/Users/apple/Desktop/sample.jpg", 'wb') as f:
        f.write(response.content)

edited Nov 11 '16 at 01:25

kerel

9
3

answered Feb 06 '14 at 06:33

kiranbkrishna

2,490
1
14
10

1

what do you mean with ! `f = open("/Users/apple/Desktop/sample.jpg", 'wb')` what do you mean with this path !? i want to download image – smile Nov 02 '16 at 17:48
6

That opens a file descriptor in the path specified to which the image file can be written. – kiranbkrishna Nov 03 '16 at 10:07
@AndrewGlazkov I think it would be more Pythonic to use `if response.ok:` – EndermanAPM Aug 08 '18 at 18:40
13

response.ok is True for any 1xx, 2xx or 3xx status, but only a 200 response has a response body as @Martijn Pieters mentioned in the comments above – annndrey Jan 12 '19 at 21:46

score 91 · Answer 4 · edited Sep 30 '22 at 03:13

91

I have the same need for downloading images using requests. I first tried the answer of Martijn Pieters, and it works well. But when I did a profile on this simple function, I found that it uses so many function calls compared to urllib and urllib2.

I then tried the way recommended by the author of requests module:

import requests
from PIL import Image
# python2.x, use this instead  
# from StringIO import StringIO
# for python3.x,
from io import StringIO

r = requests.get('https://example.com/image.jpg')
i = Image.open(StringIO(r.content))

This much more reduced the number of function calls, thus speeded up my application. Here is the code of my profiler and the result.

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile

def testRequest():
    image_name = 'test1.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url)
    
    i = Image.open(StringIO(r.content))
    i.save(image_name)

if __name__ == '__main__':
    profile.run('testUrllib()')
    profile.run('testUrllib2()')
    profile.run('testRequest()')

The result for testRequest:

343080 function calls (343068 primitive calls) in 2.580 seconds

And the result for testRequest2:

3129 function calls (3105 primitive calls) in 0.024 seconds

edited Sep 30 '22 at 03:13

Ahmad Raza

13
5

answered Aug 07 '13 at 15:52

Zhenyi Zhang

1,285
10
8

15

This is because you've not specified the `chunk_size` parameter which defaults to 1, so `iter_content` is iterating over the result stream 1 byte at a time. See the documentation http://www.python-requests.org/en/latest/api/#requests.Response.iter_content. – CadentOrange Oct 17 '13 at 15:53
11

This also loads the whole response into memory, which you may want to avoid. There is no to use `PIL` here either, just `with open(image_name, 'wb') as outfile: outfile.write(r.content)` is enough. – Martijn Pieters Jan 09 '14 at 13:25
4

`PIL` is also not in the standard library making this a bit less portable. – jjj Dec 22 '15 at 21:19
2

@ZhenyiZhang `iter_content` is slow because your `chunk_size` is too small, if you increase it to 100k it will be much faster. – Wang Feb 03 '17 at 17:25
1

This is the best answer. It isn't always best to read the file into memory, but OP specified "images" meaning the files will usually be less than 4MB, thus having a trivial impact on memory. – Chris Conlan Nov 29 '18 at 17:09
8

It appears that `from StringIO import StringIO`, is now `from io import BytesIO` according to the requests author `http://docs.python-requests.org/en/latest/user/quickstart/#binary-response-content` – SeaDude Apr 17 '19 at 04:16

Blairg23 · Answer 5 · 2020-04-13T23:58:07.277

74

This might be easier than using requests. This is the only time I'll ever suggest not using requests to do HTTP stuff.

Two liner using urllib:

>>> import urllib
>>> urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

There is also a nice Python module named wget that is pretty easy to use. Found here.

This demonstrates the simplicity of the design:

>>> import wget
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
>>> filename = wget.download(url)
100% [................................................] 3841532 / 3841532>
>> filename
'razorback.mp3'

Enjoy.

Edit: You can also add an out parameter to specify a path.

>>> out_filepath = <output_filepath>    
>>> filename = wget.download(url, out=out_filepath)

edited Apr 13 '20 at 23:58

answered Nov 23 '15 at 08:02

Blairg23

11,334
6
72
72

I used `wget` without any hassles. Thanks for stating the benefits of using `urllib3` – Jitendra Apr 02 '20 at 21:41
3

Note that this answer is for Python 2. For Python 3 you need to do `urllib.request.urlretrieve("http://example.com", "file.ext")`. – Husky Apr 09 '20 at 13:14
1

Thanks @Husky. Updated. – Blairg23 Apr 13 '20 at 23:58
Can we compress image size here ? @Blairg23 – Faiyaj Dec 10 '20 at 07:02
@Faiyaj No, this is just `wget`, there is no compression of files. – Blairg23 Dec 13 '20 at 02:50
In PyCharm I has to do `import urllib.request`. With `import urllib` it gave the error `Cannot find reference 'request' in '__init__.pyi'`. `import urllib` did work in the console though. – Roald Feb 14 '22 at 14:43

Katja Süss · Answer 6 · 2020-01-02T16:10:02.307

42

Following code snippet downloads a file.

The file is saved with its filename as in specified url.

import requests

url = "http://example.com/image.jpg"
filename = url.split("/")[-1]
r = requests.get(url, timeout=0.5)

if r.status_code == 200:
    with open(filename, 'wb') as f:
        f.write(r.content)

edited Jan 02 '20 at 16:10

answered Apr 07 '17 at 19:42

Katja Süss

759
6
13

score 24 · Answer 7 · edited May 23 '17 at 11:47

24

There are 2 main ways:

Using .content (simplest/official) (see Zhenyi Zhang's answer):

import io  # Note: io.BytesIO is StringIO.StringIO on Python2.
import requests

r = requests.get('http://lorempixel.com/400/200')
r.raise_for_status()
with io.BytesIO(r.content) as f:
    with Image.open(f) as img:
        img.show()

Using .raw (see Martijn Pieters's answer):

import requests

r = requests.get('http://lorempixel.com/400/200', stream=True)
r.raise_for_status()
r.raw.decode_content = True  # Required to decompress gzip/deflate compressed responses.
with PIL.Image.open(r.raw) as img:
    img.show()
r.close()  # Safety when stream=True ensure the connection is released.

Timing both shows no noticeable difference.

edited May 23 '17 at 11:47

Community

1
1

answered May 31 '16 at 14:01

Wernight

36,122
25
118
131

3

I tried a bunch of answers, and your `1.` answer (using `io.BytesIO` and `Image`) was the first one that worked for me on Python 3.6. Don't forget `from PIL import Image` (and `pip install Pillow`). – colllin Dec 04 '17 at 23:53
What's different between .content and .raw? – foxiris May 06 '19 at 08:37

score 19 · Answer 8 · answered Sep 17 '18 at 08:33

19

As easy as to import Image and requests

from PIL import Image
import requests

img = Image.open(requests.get(url, stream = True).raw)
img.save('img1.jpg')

answered Sep 17 '18 at 08:33

Riccardo D

591
1
5
12

score 6 · Answer 9 · answered Oct 18 '19 at 10:37

This is how I did it

import requests
from PIL import Image
from io import BytesIO

url = 'your_url'
files = {'file': ("C:/Users/shadow/Downloads/black.jpeg", open('C:/Users/shadow/Downloads/black.jpeg', 'rb'),'image/jpg')}
response = requests.post(url, files=files)

img = Image.open(BytesIO(response.content))
img.show()

score 5 · Answer 10 · edited May 23 '17 at 12:02

Here is a more user-friendly answer that still uses streaming.

Just define these functions and call getImage(). It will use the same file name as the url and write to the current directory by default, but both can be changed.

import requests
from StringIO import StringIO
from PIL import Image

def createFilename(url, name, folder):
    dotSplit = url.split('.')
    if name == None:
        # use the same as the url
        slashSplit = dotSplit[-2].split('/')
        name = slashSplit[-1]
    ext = dotSplit[-1]
    file = '{}{}.{}'.format(folder, name, ext)
    return file

def getImage(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    with open(file, 'wb') as f:
        r = requests.get(url, stream=True)
        for block in r.iter_content(1024):
            if not block:
                break
            f.write(block)

def getImageFast(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(file)

if __name__ == '__main__':
    # Uses Less Memory
    getImage('http://www.example.com/image.jpg')
    # Faster
    getImageFast('http://www.example.com/image.jpg')

The request guts of getImage() are based on the answer here and the guts of getImageFast() are based on the answer above.

score 5 · Answer 11 · answered May 24 '16 at 13:50

5

I'm going to post an answer as I don't have enough rep to make a comment, but with wget as posted by Blairg23, you can also provide an out parameter for the path.

 wget.download(url, out=path)

answered May 24 '16 at 13:50

justincc

103
2
6

score 5 · Answer 12 · edited Mar 13 '22 at 19:14

5

my approach was to use response.content (blob) and save to the file in binary mode

img_blob = requests.get(url, timeout=5).content
with open(destination + '/' + title, 'wb') as img_file:
     img_file.write(img_blob)

Check out my python project that downloads images from unsplash.com based on keywords.

edited Mar 13 '22 at 19:14

Abhi

1,080
1
7
21

answered Aug 20 '20 at 01:01

Adriano_Pinaffo

1,429
4
23
46

score 4 · Answer 13 · answered Jun 17 '19 at 13:23

4

This is the first response that comes up for google searches on how to download a binary file with requests. In case you need to download an arbitrary file with requests, you can use:

import requests
url = 'https://s3.amazonaws.com/lab-data-collections/GoogleNews-vectors-negative300.bin.gz'
open('GoogleNews-vectors-negative300.bin.gz', 'wb').write(requests.get(url, allow_redirects=True).content)

answered Jun 17 '19 at 13:23

duhaime

25,611
17
169
224

1

Nice! It has even an implicit `.close()`. This is the best answer as of 2019 I guess. – Daniel W. Jun 24 '19 at 02:50

score 3 · Answer 14 · edited Jul 05 '23 at 00:28

TL;DR

Summarizing the great answers from others.

Method	Needs `requests`	Needs PIL	Needs ...
`requests.get` -> `shutil`	Yes	No	-
`requests.get` -> `open(mode="wb")`	Yes	No	-
`requests.get` -> `ByteIO` -> `Image.save`	Yes	Yes	-
`urllib`	-	-	-
`wget`	No	No	`wget`
`requests.get` -> `PIL.Image` -> `np.save`	Yes	Yes	`numpy`

Use `shutil` and output the decoded raw content from `requests.get`

Original answer modified from https://stackoverflow.com/a/13137873/610569

import shutil
import requests

img_url = 'https://techcrunch.com/wp-content/uploads/2023/03/dpreview.jpg'

response = requests.get(img_url, stream=True)        
with open('dpreview.jpg', 'wb') as fout:
    response.raw.decode_content = True
    shutil.copyfileobj(response.raw, fout)

Writing the binary directly into file I/O

import requests

img_url = 'https://techcrunch.com/wp-content/uploads/2023/03/dpreview.jpg'

response = requests.get(img_url, stream=True) 

with open('dpreview.jpg', 'wb') as fout:
    for chunk in response:
        fout.write(chunk)

Stream content into `io.BytesIO` into `PIL.Image` object and save it

from io import BytesIO

import requests
from PIL import Image

img_url = 'https://techcrunch.com/wp-content/uploads/2023/03/dpreview.jpg'

# Stream to BytesIO
response = requests.get(img_url, stream=True)
img = Image.open(BytesIO(response.content))
img.save('dpreview.jpg')


# Using raw content
response = requests.get(img_url, stream=True)
img = Image.open(response.raw)
img.save('dpreview.jpg')

Use `urllib`

Original answer from https://stackoverflow.com/a/33866125/610569

import urllib

img_url = 'https://techcrunch.com/wp-content/uploads/2023/03/dpreview.jpg'

urllib.request.urlretrieve(img_url, "dpreview.jpg")

And if a specific user-agent is needed for the request, from https://stackoverflow.com/a/69764951/610569

import urllib

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582')]
urllib.request.install_opener(opener)

img_url = 'https://techcrunch.com/wp-content/uploads/2023/03/dpreview.jpg'

urllib.request.urlretrieve(img_url, "dpreview.jpg")

Use `wget`

import wget

img_url = 'https://techcrunch.com/wp-content/uploads/2023/03/dpreview.jpg'

wget.download(img_url, out='dpreview.jpg')

Saving `PIL.Image` as `numpy` array

import requests
from PIL import Image

import numpy as np


img_url = 'https://techcrunch.com/wp-content/uploads/2023/03/dpreview.jpg'

response = requests.get(img_url, stream=True) 
img = Image.open(response.raw)

# Converts and save image into numpy array.
np.save('dpreview.npy', np.asarray(img))

# Loads a npy file to Image
img_arr = np.load('dpreview.npy')
img = Image.fromarray(img_arr.astype(np.uint8))

score 1 · Answer 15 · edited May 02 '19 at 21:09

1

You can do something like this:

import requests
import random

url = "https://images.pexels.com/photos/1308881/pexels-photo-1308881.jpeg? auto=compress&cs=tinysrgb&dpr=1&w=500"
name=random.randrange(1,1000)
filename=str(name)+".jpg"
response = requests.get(url)
if response.status_code.ok:
   with open(filename,'w') as f:
    f.write(response.content)

edited May 02 '19 at 21:09

hkanjih

1,271
1
11
29

answered May 02 '19 at 19:21

Jyotiprakash Das

377
2
5

Dmitriy Zub · Answer 16 · 2021-11-04T18:02:27.463

Agree with Blairg23 that using urllib.request.urlretrieve is one of the easiest solutions.

One note I want to point out here. Sometimes it won't download anything because the request was sent via script (bot), and if you want to parse images from Google images or other search engines, you need to pass user-agent to request headers first, and then download the image, otherwise, the request will be blocked and it will throw an error.

Pass user-agent and download image:

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582')]
urllib.request.install_opener(opener)

urllib.request.urlretrieve(URL, 'image_name.jpg')

Code in the online IDE that scrapes and downloads images from Google images using requests, bs4, urllib.requests.

Alternatively, if your goal is to scrape images from search engines like Google, Bing, Yahoo!, DuckDuckGo (and other search engines), then you can use SerpApi. It's a paid API with a free plan.

The biggest difference is that there's no need to figure out how to bypass blocks from search engines or how to extract certain parts from the HTML or JavaScript since it's already done for the end-user.

Example code to integrate:

import os, urllib.request
from serpapi import GoogleSearch

params = {
  "api_key": os.getenv("API_KEY"),
  "engine": "google",
  "q": "pexels cat",
  "tbm": "isch"
}

search = GoogleSearch(params)
results = search.get_dict()

print(json.dumps(results['images_results'], indent=2, ensure_ascii=False))

# download images 
for index, image in enumerate(results['images_results']):

    # print(f'Downloading {index} image...')
    
    opener=urllib.request.build_opener()
    opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582')]
    urllib.request.install_opener(opener)

    # saves original res image to the SerpApi_Images folder and add index to the end of file name
    urllib.request.urlretrieve(image['original'], f'SerpApi_Images/original_size_img_{index}.jpg')

-----------
'''
]
  # other images
  {
    "position": 100, # 100 image
    "thumbnail": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQK62dIkDjNCvEgmGU6GGFZcpVWwX-p3FsYSg&usqp=CAU",
    "source": "homewardboundnj.org",
    "title": "pexels-helena-lopes-1931367 - Homeward Bound Pet Adoption Center",
    "link": "https://homewardboundnj.org/upcoming-event/black-cat-appreciation-day/pexels-helena-lopes-1931367/",
    "original": "https://homewardboundnj.org/wp-content/uploads/2020/07/pexels-helena-lopes-1931367.jpg",
    "is_product": false
  }
]
'''

Disclaimer, I work for SerpApi.

score -1 · Answer 17 · answered Aug 21 '22 at 22:23

Here is a very simple code

import requests

response = requests.get("https://i.imgur.com/ExdKOOz.png") ## Making a variable to get image.

file = open("sample_image.png", "wb") ## Creates the file for image
file.write(response.content) ## Saves file content
file.close()

score -3 · Answer 18 · answered Jun 17 '21 at 05:33

-3

for download Image

import requests
Picture_request = requests.get(url)

answered Jun 17 '21 at 05:33

David Johnson

1
3

1

It would be great if everything was that simple. Unfortunately, the code in your example doesn't save image. It can open the image and that's it. – Dmitriy Zub Oct 29 '21 at 07:14

How to download image using requests

18 Answers18

TL;DR

Use `shutil` and output the decoded raw content from `requests.get`

Writing the binary directly into file I/O

Stream content into `io.BytesIO` into `PIL.Image` object and save it

Use `urllib`

Use `wget`

Saving `PIL.Image` as `numpy` array

Linked

Related

How to download image using requests

18 Answers18

TL;DR

Use shutil and output the decoded raw content from requests.get

Writing the binary directly into file I/O

Stream content into io.BytesIO into PIL.Image object and save it

Use urllib

Use wget

Saving PIL.Image as numpy array

Linked

Related

Use `shutil` and output the decoded raw content from `requests.get`

Stream content into `io.BytesIO` into `PIL.Image` object and save it

Use `urllib`

Use `wget`

Saving `PIL.Image` as `numpy` array