I am trying to access the page https://seekingalpha.com/api/v3/symbols/hsy/press-releases using python requests
.
If I go manually to the page, open the devtools panel, and check the requests https://seekingalpha.com/api/v3/news?filter[category]=market-news%3A%3Aall&page[size]=5, I can copy-paste the request headers that contains the cookie of the site, and by manually setting those I am able to then reach the webpage using requests
:
headers = {
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8,fr;q=0.7',
'cache-control': 'no-cache',
'cookie' : 'machine_cookie=0427562897246; _pxvid=3a123ba7-3e5d-11eb-a61d-0242ac120017; prism_25946650=8900e9d7-b37b-4d84-9487-0201d2065590; _ga=GA1.2.747536612.1607985570; _gid=GA1.2.1375655082.1607985570; _gcl_au=1.1.785293143.1607985570; __tbc=%7Bjzx%7DlCp-P5kTOqotpgFeypItnMNfCqB03Jnfrv3KQZtXwF3ncfmaDQ98SBay3PXmCBnvPDoBVUmuXm8FouQ0JGElHQT-keiLYDix8RYL4SoUOxJjMub4h3TpZRVQ_edVMb61UVgFAp6l8Mpn6PJS7yCpyA; __pat=-18000000; __pvi=%7B%22id%22%3A%22v-2020-12-14-22-39-29-611-1jo5IsQ7wmgpITTS-3f23a5404637a2a40a85f9ea30050d82%22%2C%22domain%22%3A%22.seekingalpha.com%22%2C%22time%22%3A1607987120778%7D; xbc=%7Bjzx%7DTsLlDv3TXKwd1pAfboMsSnAV3s9R4OnJHOTGW34XfqHE9XguV0cuq-tg-wpJwicWtq5BbeakKu9-e2k9mudI9_nZX365XWEAIEiYbfoRgRsjmdC0GsUSh9_Z0HBjeiY1JY4_tnmYAU4S-z_H3LEmfMyTffbP-zyj1qTHoxeuH9Mm0Ce7LB5xgxX03a65iNmBWhmboGNXjyyWjs7SwY402e_Sk1_4O4l073jBmh9jRLU7AkV6QBL22p2g1qgC78KI12HAOLlDFhRc1QuLjNNzU7G1D3QVi6NamFxveoczdabIhqbAgRSqMRR8tMk-PavGOusVNURIs0m9avquxB0LjhuCIeBKg2K3IABSmyH1pFhZGath0E2HTTJ8ueb6Yj_0oQ8OBqx1YI9l4eFkdPJt06y1_boHQhYNOgCT6OewGdj-ZCEbP5w3D3aSfBbdCgXNgKh2Ys34RMi11ejU7r0TEOdd21h_kWrMpZw7qlE3_Xh9HhtaWjujLnCTpXPgAVId; _uetsid=3a9a15903e5d11ebb0c8cbc81eeed304; _uetvid=3a9a89703e5d11ebb9f4cd856d5d177f; _px2=eyJ1IjoiZDZhY2JmNDAtM2U2MC0xMWViLTljMGItMWQ0OGRjOWJlNDA2IiwidiI6IjNhMTIzYmE3LTNlNWQtMTFlYi1hNjFkLTAyNDJhYzEyMDAxNyIsInQiOjE2MDgwMDgwNTQyMTIsImgiOiIzYTQzMTU3MGJkMzE2ZmQ1YTVkZjM1ZjFmZWU3NTQxYjJiMTcxZmY5M2I0NDUyYzQ1YTFiZGNhOGFiYmRiMDFlIn0=; _px=+bUqf8l/WIbrt+qNCDX+18JknkuO9/05f6FMm402KUELBnVmyufZp2ExW6YDfg8Qu+eI3ae73PcqrVn+numnTQ==:1000:7m/qEw8v5Fh6e0zEdth41JR4ArTi5emjJZWnzK1p2ZznQQQpHdKInTpt8i272JpgAUaJ1jO25sNB4p72C5WOwNgCAyxzECTWG/Mws+llWhTXPmBNGMZFuHCc1P3YPOs4ffSGTx078fuE28EFuQIC3sDnhQum+tIxxwH5UHZkRwiGvL0whtVhUyFsfpdtwPabudbmriBXFvMDq8TOPZPpLzOKVzOzXDVrscLXMpEw14UisbsjBksCU4MhYyRmF03JH2lPI6SbTo8unDxeJhIKZg==; _pxde=3909397fae9c6c84b8595d0ca41405600414dec85ef42530c75d2d03d38258a8:eyJ0aW1lc3RhbXAiOjE2MDgwMDc1NTQyMTIsImZfa2IiOjB9; session_id=80ba4e65-7eda-4b97-b0c7-a1262e40ed4e',
'pragma': 'no-cache',
'referer': 'https://seekingalpha.com/symbol/HSY/press-releases',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
}
requests.get(url, headers = headers)
However that cookie will expire after few hours and then I will need to manually perform again the same process if i want to use the script again.
I was hoping that it would be possible to emulate the same path that a human/manual does on the site with requests
, i.e. first going to the gate page https://seekingalpha.com/, having the cookie being set on that page, and then being able to reach the target page.
Something like
headers = {'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
session = requests.get('https://seekingalpha.com/', headers = headers)
session.get('https://seekingalpha.com/api/v3/symbols/hsy/press-releases')
However doing so i receive error 403
. I have tried to inspect the network panel in the devtools to find which is the http requests that contains Set-Cookies
using F5 but somehow i couldnt find it (I did use this approach in the past with some success)