I am using requests
to request a page. The task is very simple, but I have a problem with encoding. The page contains non-ascii, Turkish characters, but in the HTML source, the result is as below:
ÇINARTEPE # What it looks like
ÇINARTEPE # What it is like in HTML source
So, the operations below do not return what I expected:
# What I have tried as encoding
req.encoding = "utf-8"
req.encoding = "iso-8859-9"
req.encoding = "iso-8859-1"
# The operations
"ÇINARTEPE" in req.text # False, it must return True
bytes("ÇINARTEPE", "utf-8") in req.content # False
bytes("ÇINARTEPE", "iso-8859-9") in req.content # False
bytes("ÇINARTEPE", "iso-8859-1") in req.content # False
All I want is to find out if "ÇINARTEPE"
string is in HTML source.
Further Information
An example:
req = requests.get("http://www.eshot.gov.tr/tr/OtobusumNerede/290")
"ÇINARTEPE" in req.text # False
req.encoding = "iso-8859-1"
"ÇINARTEPE" in req.text # False
req.encoding = "iso-8859-9"
"ÇINARTEPE" in req.text # False
# Supposed to return True
Environment
- python 3.5.1
- requests 2.10.0