I am having an encoding issue, when making the exact same request from my spider on the one side, and from the scrapy shell on the other side, the responses I get are not in the same encoding.
I.e. when scraping using my spider:
def parse(self, response):
print(response.headers[b'Content-Type'])
b'text/html; charset=utf-8'
Whereas when using the scrapy shell:
scrapy shell https://www.agoravox.fr/tribune-libre/article/attentat-contre-charlie-hebdo-161711
>>> response.headers[b'Content-Type']
b'text/html; charset=iso-8859-1'
And this is highly problematic as the page is encoded in iso-8859-1, therefore I'm getting unicode replacement characters while scraping from my spider afterwards. Any ideas?
Thank you