0

This code works as expected:

from urllib.request import urlopen 
with urlopen('https://mr.wikipedia.org/s/4jp4') as f:
    f.read().decode('utf-8')

But similar code returns an error. Both the URL's point to the same wiki article.

from urllib.request import urlopen 
with urlopen('https://mr.wikipedia.org/wiki/किशोरावस्था') as f:
    f.read().decode('utf-8')

UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-20: ordinal not in range(128)

I need to use python built-in modules and therefore can not use requests module.


This works. But in my case the URL is part of an API and I do not know which word to quote. Is there more general solution like requests?

from urllib.parse   import quote
from urllib.request import urlopen

url = 'https://mr.wikipedia.org/wiki/' + quote("किशोरावस्था")
content = urlopen(url).read()
shantanuo
  • 31,689
  • 78
  • 245
  • 403

1 Answers1

2

The url is the error creator here. Try:

from urllib.request import urlopen 
with urlopen('https://mr.wikipedia.org/wiki/'+urllib.parse.quote('किशोरावस्था')) as f:
    f.read().decode('utf-8')
Joshua Varghese
  • 5,082
  • 1
  • 13
  • 34