using the requests.session object to set cookies in order to access page

Question

I am trying to access the page https://seekingalpha.com/api/v3/symbols/hsy/press-releases using python requests.

If I go manually to the page, open the devtools panel, and check the requests https://seekingalpha.com/api/v3/news?filter[category]=market-news%3A%3Aall&page[size]=5, I can copy-paste the request headers that contains the cookie of the site, and by manually setting those I am able to then reach the webpage using requests:

headers = {
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8,fr;q=0.7',
'cache-control': 'no-cache',
'cookie' : 'machine_cookie=0427562897246; _pxvid=3a123ba7-3e5d-11eb-a61d-0242ac120017; prism_25946650=8900e9d7-b37b-4d84-9487-0201d2065590; _ga=GA1.2.747536612.1607985570; _gid=GA1.2.1375655082.1607985570; _gcl_au=1.1.785293143.1607985570; __tbc=%7Bjzx%7DlCp-P5kTOqotpgFeypItnMNfCqB03Jnfrv3KQZtXwF3ncfmaDQ98SBay3PXmCBnvPDoBVUmuXm8FouQ0JGElHQT-keiLYDix8RYL4SoUOxJjMub4h3TpZRVQ_edVMb61UVgFAp6l8Mpn6PJS7yCpyA; __pat=-18000000; __pvi=%7B%22id%22%3A%22v-2020-12-14-22-39-29-611-1jo5IsQ7wmgpITTS-3f23a5404637a2a40a85f9ea30050d82%22%2C%22domain%22%3A%22.seekingalpha.com%22%2C%22time%22%3A1607987120778%7D; xbc=%7Bjzx%7DTsLlDv3TXKwd1pAfboMsSnAV3s9R4OnJHOTGW34XfqHE9XguV0cuq-tg-wpJwicWtq5BbeakKu9-e2k9mudI9_nZX365XWEAIEiYbfoRgRsjmdC0GsUSh9_Z0HBjeiY1JY4_tnmYAU4S-z_H3LEmfMyTffbP-zyj1qTHoxeuH9Mm0Ce7LB5xgxX03a65iNmBWhmboGNXjyyWjs7SwY402e_Sk1_4O4l073jBmh9jRLU7AkV6QBL22p2g1qgC78KI12HAOLlDFhRc1QuLjNNzU7G1D3QVi6NamFxveoczdabIhqbAgRSqMRR8tMk-PavGOusVNURIs0m9avquxB0LjhuCIeBKg2K3IABSmyH1pFhZGath0E2HTTJ8ueb6Yj_0oQ8OBqx1YI9l4eFkdPJt06y1_boHQhYNOgCT6OewGdj-ZCEbP5w3D3aSfBbdCgXNgKh2Ys34RMi11ejU7r0TEOdd21h_kWrMpZw7qlE3_Xh9HhtaWjujLnCTpXPgAVId; _uetsid=3a9a15903e5d11ebb0c8cbc81eeed304; _uetvid=3a9a89703e5d11ebb9f4cd856d5d177f; _px2=eyJ1IjoiZDZhY2JmNDAtM2U2MC0xMWViLTljMGItMWQ0OGRjOWJlNDA2IiwidiI6IjNhMTIzYmE3LTNlNWQtMTFlYi1hNjFkLTAyNDJhYzEyMDAxNyIsInQiOjE2MDgwMDgwNTQyMTIsImgiOiIzYTQzMTU3MGJkMzE2ZmQ1YTVkZjM1ZjFmZWU3NTQxYjJiMTcxZmY5M2I0NDUyYzQ1YTFiZGNhOGFiYmRiMDFlIn0=; _px=+bUqf8l/WIbrt+qNCDX+18JknkuO9/05f6FMm402KUELBnVmyufZp2ExW6YDfg8Qu+eI3ae73PcqrVn+numnTQ==:1000:7m/qEw8v5Fh6e0zEdth41JR4ArTi5emjJZWnzK1p2ZznQQQpHdKInTpt8i272JpgAUaJ1jO25sNB4p72C5WOwNgCAyxzECTWG/Mws+llWhTXPmBNGMZFuHCc1P3YPOs4ffSGTx078fuE28EFuQIC3sDnhQum+tIxxwH5UHZkRwiGvL0whtVhUyFsfpdtwPabudbmriBXFvMDq8TOPZPpLzOKVzOzXDVrscLXMpEw14UisbsjBksCU4MhYyRmF03JH2lPI6SbTo8unDxeJhIKZg==; _pxde=3909397fae9c6c84b8595d0ca41405600414dec85ef42530c75d2d03d38258a8:eyJ0aW1lc3RhbXAiOjE2MDgwMDc1NTQyMTIsImZfa2IiOjB9; session_id=80ba4e65-7eda-4b97-b0c7-a1262e40ed4e',
'pragma': 'no-cache',
'referer': 'https://seekingalpha.com/symbol/HSY/press-releases',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
}

requests.get(url, headers = headers)

However that cookie will expire after few hours and then I will need to manually perform again the same process if i want to use the script again.

I was hoping that it would be possible to emulate the same path that a human/manual does on the site with requests, i.e. first going to the gate page https://seekingalpha.com/, having the cookie being set on that page, and then being able to reach the target page.

Something like

headers = {'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
session = requests.get('https://seekingalpha.com/', headers = headers)
session.get('https://seekingalpha.com/api/v3/symbols/hsy/press-releases')

However doing so i receive error 403. I have tried to inspect the network panel in the devtools to find which is the http requests that contains Set-Cookies using F5 but somehow i couldnt find it (I did use this approach in the past with some success)

because of performance issue i would try to do it without browser — jim jarnac, Dec 16 '20 at 02:52
I managed to sue that technique on another website and i think it should be possible on that website too. — jim jarnac, Dec 16 '20 at 02:59
Does this answer your question? [How to use cookies in Python Requests](https://stackoverflow.com/questions/31554771/how-to-use-cookies-in-python-requests) — Charalamm, Dec 16 '20 at 03:26

MarioXbrl · Answer 1 · 2020-12-15T05:15:41.270

1

Maybe this helps you or maybe it won't but you can use Postman interceptor functionality to capture the requests done by chrome. You can use that to see the actual requests that are done and maybe it will shed some light to help you resolve the issue.

edited Dec 15 '20 at 05:15

answered Dec 15 '20 at 05:10

MarioXbrl

11
2

score 1 · Accepted Answer · answered Dec 16 '20 at 06:34

Your header is wrong, try it like this:

import requests
import json
s=requests.Session()
url="https://seekingalpha.com/api/v3/symbols/hsy/press-releases"
s.headers={
    "accept": "application/json, text/plain, */*",
    "accept-language": "en-US,en;q=0.9",
    "cache-control": "no-cache",
    "pragma": "no-cache",
    "sec-fetch-dest": "empty",
    "sec-fetch-mode": "cors",
    "sec-fetch-site": "same-site",
    "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
  }
r=s.get(url)
print(r.text)

After you can parse response:

m=json.loads(r.text)

using the requests.session object to set cookies in order to access page

2 Answers2