0

I write a python script to retrieve the image from url:

url = `https://uploads0.wikiart.org/images/albrecht-durer/watermill-at-the-montaсa.jpg`
urllib.request.urlretrieve(url, STYLE_IMAGE_UPLOAD + "wikiart" + "/" + url)

When I run I got the message

UnicodeEncodeError: 'ascii' codec can't encode character '\u0441' in position 49: ordinal not in range(128)

I think the problem from the image url

'https://uploads0.wikiart.org/images/albrecht-durer/watermill-at-the-monta\u0441a.jpg',

How to fix this problem?

tree em
  • 20,379
  • 30
  • 92
  • 130
  • Can you post the full stack trace? Also, your code snippet isn't reproducible, as we don't know what `STYLE_IMAGE_UPLOAD` is. – lenz Dec 07 '19 at 16:12
  • 1
    Oh I see, the "c" in "Montaca" is a Cyrillic letter. Have you tried url-encoding the address? – lenz Dec 07 '19 at 16:28
  • Btw, [here](https://stackoverflow.com/q/4389572)'s a collection of more general approaches and alternative solutions. For example, I learned that the `requests` library handles addresses like that out of the box. – lenz Dec 08 '19 at 19:39

1 Answers1

1

The URL contains a non-ASCII character (a Cyrillic letter that looks like a Latin "c").

Escape this character using the urllib.parse.quote function:

url = 'https://uploads0.wikiart.org' + urllib.parse.quote('/images/albrecht-durer/watermill-at-the-montaсa.jpg')
urllib.request.urlretrieve(url, '/tmp/watermill.jpg')

Don't put the entire URL in the quote function, otherwise it would escape the colon (":") in "https://".

lenz
  • 5,658
  • 5
  • 24
  • 44