I want to crawl (gently) a website and download each HTML page that I crawl. To accomplish that I use the library requests. I already did my crawl-listing and I try to crawl them using urllib.open but without user-agent, I get an error message. So I choose to use requests, but I don't really know how to use it.
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1'
}
page = requests.get('http://www.xf.com/ranking/get/?Amount=1&From=left&To=right', headers=headers)
with open('pages/test.html', 'w') as outfile:
outfile.write(page.text)
The problem is when the script try to write the response in my file I get some encoding error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 6673-6675: ordinal not in range(128)
How can we write in a file without having those encoding problem?