UnicodeEncodeError: 'ascii' codec can't encode characters in position 30-31: ordinal not in range(128)

Question

I'm currently studying web scraping, this is just for test! I have no idea why this error arose, would you look at the code what I did wrongly and help me to solve the issues?

from urllib.request import urlopen    
from bs4 import BeautifulSoup as bs    
from urllib.request import HTTPError    
import sys    
html = urlopen("https://www.expedia.co.kr/Hotel-Search?destination=서울&startDate=2019.06.06&endDate=2019.06.07&rooms=1&adults=2")    
soup = bs(html,"html.parser")    
section = soup.find_all(class_="cf flex-1up flex-listing flex-theme-light cols-nested")    
card = soup.find_all(class_="flex-card")    
infoprice = soup.find_all(class_="flex-content info-and-price MULTICITYVICINITY avgPerNight")    
rows = soup.find_all(class_="flex-area-primary")    
hotelinfo = soup.find_all('ul',class_="hotel-info")    
hotelTitles = soup.find_all('li',class_="hotelTitle")    
for hotelTitle in hotelTitles:        
    hotellist = hotelTitle.find('h4',class_="hotelName fakeLink")        
    h = hotellist.get.text().strip()        
    print(h)

Please provide the actual error message, including the traceback showing where in your code the error originates. — MisterMiyagi, May 11 '19 at 16:41
Have a look at the answer by [bobince](https://stackoverflow.com/questions/4389572/how-to-fetch-a-non-ascii-url-with-python-urlopen) on this page. He explained how Strictly speaking URIs can't contain non-ASCII characters, and how you can convert to a plain ASCII URI. — Ahmadore, May 11 '19 at 15:44

Xosrov · Answer 1 · 2019-05-11T16:02:30.150

0

Why not use requests instead:

import requests
html = requests.get("https://www.expedia.co.kr/Hotel-Search?destination=서울&startDate=2019.06.06&endDate=2019.06.07&rooms=1&adults=2")
soup = BeautifulSoup(html.content,'html.parser')

I found it avoids possible encoding problems and in your case the rest of the code remains the same.

edited May 11 '19 at 16:02

answered May 11 '19 at 15:50

Xosrov

719
4
22

score 0 · Answer 2 · answered May 11 '19 at 16:37

You can mimic the POST request the page makes and use requests. You get a json response with all the hotel data in. View example json response here.

import requests   

headers = {'User-Agent' : 'Mozilla/5.0', 'Referer' : 'https://www.expedia.co.kr/Hotel-Search?destination=%EC%84%9C%E'}
r = requests.post("https://www.expedia.co.kr/Hotel-Search-Data?responsive=true&destination=%EC%84%9C%EC%9A%B8&startDate=2019.06.06&endDate=2019.06.07&rooms=1&adults=2&timezoneOffset=3600000&langid=1042&hsrIdentifier=HSR&?1555393986866", headers = headers, data = '').json()   
for hotel in r['searchResults']['retailHotelModels']:
    print(hotel['retailHotelInfoModel']['hotelName'])

UnicodeEncodeError: 'ascii' codec can't encode characters in position 30-31: ordinal not in range(128)

2 Answers2