I am trying to scrape a Japanese website (a trimmed down sample below):
<html>
<head>
<meta charset="euc-jp">
</head>
<body>
<h3>不審者の出没</h3>
</body>
</html>
I am trying to get data of this html by request package using:
response = requests.get(url)
I am getting data from h3 field in as: '¡ÊÂçʬ' and unicode value of it is like this:
'\xa4\xaa\xa4\xaa\xa4\xa4\xa4\xbf\'
but when I load this html from a file or from a local wsgi server (tried with Django to serve a static html page) then I get:
不審者の出没. It's actual data.
Now I am not understanding how to resolve this issue?