I'm crawling some Persian/Farsi websites using request library in python. When I use the "get" method, most of the websites respond nicely but there few others who send back unknown characters. This is an example of a response using get method in request library:
In Persian(what I supposed to receive): سیاستگذاری دولت در حوزه مسکن تغییر می کند؟
response: سÛ\x8cاستگذارÛ\x8c دÙ\x88Ù\x84ت در Ø\xadÙ\x88زÙ\x87 Ù\x85سکÙ\x86 تغÛ\x8cÛ\x8cر Ù\x85Û\x8c Ú©Ù\x86دØ\x9f
And this is my code:
import scrapy
import requests
from requests.auth import HTTPBasicAuth
url = "http://irban.ir/ShowNews/6833/%D8%B3%DB%8C%D8%A7%D8%B3%D8%AA%DA%AF%D8%B0%D8%A7%D8%B1%DB%8C-%D8%AF%D9%88%D9%84%D8%AA-%D8%AF%D8%B1-%D8%AD%D9%88%D8%B2%D9%87-%D9%85%D8%B3%DA%A9%D9%86-%D8%AA%D8%BA%DB%8C%DB%8C%D8%B1-%D9%85%DB%8C-%DA%A9%D9%86%D8%AF"
response = requests.get(url, auth=HTTPBasicAuth('test', 'testpass'),
headers={
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'},
verify=False, timeout=60).text
selector = scrapy.Selector(text=response)
css_pattern = ".forTitle"
selected_value = selector.css(css_pattern).extract_first()
print(selected_value)