1

I'm crawling some Persian/Farsi websites using request library in python. When I use the "get" method, most of the websites respond nicely but there few others who send back unknown characters. This is an example of a response using get method in request library:

And this is my code:

import scrapy
import requests
from requests.auth import HTTPBasicAuth

url = "http://irban.ir/ShowNews/6833/%D8%B3%DB%8C%D8%A7%D8%B3%D8%AA%DA%AF%D8%B0%D8%A7%D8%B1%DB%8C-%D8%AF%D9%88%D9%84%D8%AA-%D8%AF%D8%B1-%D8%AD%D9%88%D8%B2%D9%87-%D9%85%D8%B3%DA%A9%D9%86-%D8%AA%D8%BA%DB%8C%DB%8C%D8%B1-%D9%85%DB%8C-%DA%A9%D9%86%D8%AF"
response = requests.get(url, auth=HTTPBasicAuth('test', 'testpass'),
                            headers={
                                'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'},
                            verify=False, timeout=60).text
selector = scrapy.Selector(text=response)
css_pattern = ".forTitle"
selected_value = selector.css(css_pattern).extract_first()
print(selected_value)
mrasoolmirza
  • 787
  • 1
  • 6
  • 22

0 Answers0