I’m trying to retrieve html content from a php script using pythons’ requests library.
The script resides in my local Apache server and I access it directly on: http://localhost/aaa/index.php
The scripts’ content is:
<?php
$headers = json_encode(apache_request_headers());
?>
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>The Title</title>
<meta name="description" content="The Title">
</head>
<body>
<?php echo json_encode($headers); ?>
</body>
</html>
The direct access of the above script produces the following response:
<head>
<meta charset="utf-8">
<title>The Title</title>
<meta name="description" content="The Title">
</head>
<body>
"{\"Host\":\"localhost\",\"User-Agent\":\"Mozilla\\\/5.0 (Windows NT 6.3; WOW64; rv:42.0) Gecko\\\
/20100101 Firefox\\\/42.0\",\"Accept\":\"text\\\/html,application\\\/xhtml+xml,application\\\/xml;q=0
.9,*\\\/*;q=0.8\",\"Accept-Language\":\"en-US,en;q=0.5\",\"Accept-Encoding\":\"gzip, deflate\",\"Cookie
\":\"menu=users%3Bconfiguration; fieldset=; PHPSESSID=tn82odn5hdtr45mw0bkd6rhf56; nr
=5c3ab462abb1d3364b8ba59fa4d8b7f6; ru=popopo; rp=64864wb5630986rgn5860f52vy0614909b8a8736
\",\"Connection\":\"keep-alive\",\"Cache-Control\":\"max-age=0\"}"
</body>
</html>
When I access the above url [http://localhost/aaa/index.php
] using Python, I get a different response.
The Python code:
import requests
url = "http://localhost/aaa/index.php"
headers = {'User-Agent': 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'Keep-Alive',
'Content-Type': 'text/html; charset=UTF-8'}
req = requests.get(url, headers=headers)
print("Body :::", req.content)
And the response:
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>The Title</title>
<meta name="description" content="The Title">
</head>
<body>
"{\\"Host\\":\\"localhost\\",\\"Accept-Encoding\\":\\"gzip,
deflate\\",\\"Accept-Language\\":\\"en-US,en;q=0.5\\",
\\"Accept-Charset\\":\\"ISO-8859-1,utf-8;q=0.7,*;q=0.3\\",
\\"User-Agent\\":\\"Mozilla\\\\\\/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident
\\\\\\/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)\\",\\"Accept\\":\\"text\\\\\\/html,application
\\\\\\/xhtml+xml,application\\\\\\/xml;q=0.9,*
\\\\\\/*;q=0.8\\",\\"Connection\\":\\"Keep-Alive
\\",\\"Content-Type\\":\\"text\\\\\\/html; charset=UTF-8\\"}"
</body>
</html>
Notice that "Cookie" is missing when I request the resource with Python. The cookie is what I actually want to retrieve. I need it, in order to read the content from other php pages.
I also had tried the following with no success:
import requests
url = "http://localhost/aaa/index.php"
session = requests.Session()
session.cookies.get_dict()
response = session.get(url, headers=headers)
print("Cookies :::", session.cookies.get_dict())
Is there any way to accomplice that?