I need help for encoding/decoding non-ascii url to appropriate form for feeding urlopen()
method. My code for scraping url(non-ascii url) from a page and going to next page:
from urllib.request import urlopen
from bs4 import BeautifulSoup
Enterance url copy-pasted from chrome browser:
url = 'https://www.sheypoor.com/%DA%A9%D9%85%D8%AF %D9%86%D9%88%D8%AC%D9%88%D8%A7%D9%86-34926671.html'
for i in range(1,10):
html = urlopen(url)
page = BeautifulSoup(html.read(), 'html.parser')
url_obj = page.findAll('a')[13]['href'].strip()
print(url_obj)
url = url_obj
But I got an error:
'ascii' codec can't encode characters in position 5-9: ordinal not in range(128)
When I checked "UnicodeEncodeError", I saw this notification:
----> 8 html = urlopen(url)
As you are aware of the process: In first loop, urlopen() method can work with "enterance url", because it is in form of:
https://www.sheypoor.com/%DA%A9%D9%85%D8%AF-%D9%86%D9%88%D8%AC%D9%88%D8%A7%D9%86-34926671.html
But the problem will start when url_obj
, which is scraped from BeautifulSoup object, is in form of
https://www.sheypoor.com/سرویس-تخت-کمد-نوجوان-44887762.html
replaced with older url, and this form is not appropriate for feeding to urlopen() method:
I tried to find solution for converting my url_object to correct url form such as enterance url,but I failed! :-(
I would be so pleased for your support and guide to solving this problem.