How to use URLLib.request to loop through URL's and download images?

Question

My current program looks like this

import os
import urllib.request


baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"

for i in range(1,48):
    url = baseUrl % i
    urllib.request.urlretrieve(baseUrl, os.path.basename(url))

I haven't coded python in a long time, but I wrote this using urllib2 back when I used to use Python2.7.

It is supposed to replace the %s in the URL and loop through 1-48, and download all the images to the directory that the script is in. But i get alot of errors.

edit : Here is the error that is thrown.

Traceback (most recent call last):
  File "download.py", line 9, in <module>
    urllib.request.urlretrieve(url, os.path.basename(url))
  File "C:\Program Files\Python37\lib\urllib\request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "C:\Program Files\Python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Program Files\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Program Files\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Program Files\Python37\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Program Files\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Program Files\Python37\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

notice you are passing `baseUrl` into the network call. `baseUrl` is not the `url`, it is your format string and still has the `zzz-%s.jpg` at the end. — RufusVS, Jun 16 '20 at 04:07

score 1 · Answer 1 · answered Jun 16 '20 at 03:37

1

urllib.request is only available on Python 3 so you have to run the code in Python 3.

answered Jun 16 '20 at 03:37

namgold

1,009
1
11
32

Yes. I'm running on Python 3.7.4 right now but the code isn't working. it says that urllib has no attribute. – Subash Chandra Jun 16 '20 at 03:41

RufusVS · Answer 2 · 2020-06-16T16:48:17.260

0

Simple fix, if you pass the correct string:

 urllib.request.urlretrieve(url, os.path.basename(url))

The documentation says urlretrieve is a Legacy carryover, so you might want to find a different way to do this.

I found this alternate approach modified from another SO answer:

import os
import requests
baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"
for i in range(1,48):
    url = baseUrl % i
    r = requests.get(url)
    open(os.path.basename(url), 'wb').write(r.content)

edited Jun 16 '20 at 16:48

answered Jun 16 '20 at 04:11

RufusVS

4,008
3
29
40

Still not working. i've replaced the baseUrl in that line with url, and i'm still getting the errors. – Subash Chandra Jun 16 '20 at 04:42
Do you know of a non-legacy way to do this? – Subash Chandra Jun 16 '20 at 04:42
You didn't do much research. Stack Overflow has several questions about Error 403 and urlretrieve. Here's one:https://stackoverflow.com/questions/45358126/http-error-403-forbidden-while-downloading-file-using-urllib – RufusVS Jun 16 '20 at 16:35
Modified answer to non-legacy approach. – RufusVS Jun 16 '20 at 16:48

score 0 · Answer 3 · answered Jun 16 '20 at 07:57

Try using the requests module:

import requests
baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"

for i in range(1,48):
    url = baseUrl % i
    response = requests.get(url)
    my_raw_data = response.content
    with open(os.path.basename(url), 'wb') as my_data:
        my_data.write(my_raw_data)
    my_data.close()

Just to add, you must use url in the request, not the baseUrl as shown in your code :

import os
import urllib.request


baseUrl = "https://website.com/wp-content/upload/xxx/yyy/zzz-%s.jpg"

for i in range(1,48):
    url = baseUrl % i
    #urllib.request.urlretrieve(baseUrl, os.path.basename(url))
    #Use This line :
    urllib.request.urlretrieve(url, os.path.basename(url))

Run this in Python 3

How to use URLLib.request to loop through URL's and download images?

3 Answers3