1

This is my code:

import urllib.request

imglinks = ["http://www.katytrailweekly.com/Files/MalibuPokeMatt_©Marple_449-EDITED_15920174118.jpg"]

for link in imglinks:
    filename = link.split('/')[-1]
    urllib.request.urlretrieve(link, filename)

It gives me the error:

UnicodeEncodeError: 'ascii' codec can't encode character '\xa9'

How do I solve this? I tried using .encode('utf-8'), but it gives me:

TypeError: cannot use a string pattern on a bytes-like object

  • Possible duplicate of [How do I convert a unicode to a string at the Python level?](https://stackoverflow.com/questions/2783079/how-do-i-convert-a-unicode-to-a-string-at-the-python-level) – Priyank Mehta Nov 22 '17 at 11:17
  • @PriyankMehta None of the answers in the question helped. What do I do in my code? – user8578016 Nov 22 '17 at 11:22
  • See this link: https://stackoverflow.com/questions/4389572/how-to-fetch-a-non-ascii-url-with-python-urlopen – MarAja Nov 22 '17 at 11:23

1 Answers1

8

The problem here is not the encoding itself but the correct encoding to pass to `request'.

You need to quote the url as follows:

import urllib.request
import urllib.parse

imglinks = ["http://www.katytrailweekly.com/Files/MalibuPokeMatt_©Marple_449-EDITED_15920174118.jpg"]

for link in imglinks:
    link = urllib.parse.quote(link,safe=':/') # <- here
    filename = link.split('/')[-1]
    urllib.request.urlretrieve(link, filename)

This way your © symbol is encoded as %C2%A9 as the web server wants.

The safe parameter is specified to prevent quote to modify also the : after http.

Is up to you to modify the code to save the file with the correct original filename. ;)

Paolo Casciello
  • 7,982
  • 1
  • 43
  • 42