0

I am trying to fetch some details from a website using requests module and I realize that I can't do without setting the cookie in a header. However, I am not able to understand how to retrieve this cookie.

If I copy the cookie using chrome developers tools and set as part of request it works, however after some time it expires and then I have to do copy paste again, is there a way I can do auto retrieval or renewal?

Code:

headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, br',
    'cookie': 'visid_incap_820541=xigWzrvDQcSUJ0mvESKe+BR9KlwAAAAAQUIPAAAAAABRN2d88YW7aPzz88KJGqf2; optimizelyEndUserId=oeu1546288405916r0.23219734574282036; _gcl_au=1.1.1732703525.1546288407; _ga=GA1.2.125112106.1546288407; pCode=L7R 0B4; PageSize=15; AAMC_traderca_0=REGION%7C7; aam_uuid=61828363884759157680150590708572742734; .ASPXANONYMOUS=_6qGJdrX1AEkAAAANDYzYmFjYjMtOTVjMi00MzI0LWIyNTItOTZiNGNhOWUwYTI4YJzGhgZ555Ei_Iv_SWlhHlzaRMQ1; SearchResultOrderBy=PriceDesc; DealerLeadsPreTestKey=True; at_uid=mfAcGz813UijYm%2f9Gc2qqw%3d%3d; InternalSignInComplete=False; InternalSignInCompleteNew=False; cc_audpid=430e20732b28f2c7ba2d5be3182cf0ec; {E7ABF06F-D6A6-4c25-9558-3932D3B8A04D}=optimizelyEndUserId=oeu1546288405916r0.23219734574282036&pCode=L7R+0B4&PageSize=15&AAMC_traderca_0=REGION%257C7&cc_audpid=430e20732b28f2c7ba2d5be3182cf0ec&AMCVS_2650037254CC132F0A4C98A6%40AdobeOrg=1&culture=en-ca&uag=69962FE6D5D8F6D8A13AA09DEAA150E0AF8ACC624C0681A20A7E9500C633BA4F&SortOrder=PriceDesc&AMCV_2650037254CC132F0A4C98A6%40AdobeOrg=1099438348%257CMCIDTS%257C17902%257CMCMID%257C61762905408050898680174600361146729466%257CMCAAMLH-1546893208%257C7%257CMCAAMB-1547408986%257CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%257CMCOPTOUT-1546811386s%257CNONE%257CvVersion%257C2.1.0&srchLocation=%257B%2522Location%2522%3a%257B%2522Address%2522%3anull%2c%2522City%2522%3a%2522Burlington%2522%2c%2522Latitude%2522%3a43.38621%2c%2522Longitude%2522%3a-79.83713%2c%2522Province%2522%3a%2522ON%2522%2c%2522PostalCode%2522%3anull%2c%2522Type%2522%3a%2522%2522%257D%2c%2522UnparsedAddress%2522%3a%2522Burlington%2c%2520ON%2522%257D&searchState=%7b%22isUniqueSearch%22%3afalse%2c%22make%22%3a%22Honda%22%2c%22model%22%3a%22Civic%22%7d&lastsrpurl=%2fcars%2fhonda%2fcivic%2fon%2fburlington%2f&gtm_inmarket_search=true; __utmz=1.1547176815.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); SortOrder=PriceDesc; nlbi_820541_1646237=PMSfBHhxEHYkcGWQCOa5EgAAAAA4tf9PayhQkLrUhubKcWP7; AMCVS_2650037254CC132F0A4C98A6%40AdobeOrg=1; ASP.NET_SessionId=bcm0ercgtdwelfb0dgdj21vk; culture=en-ca; nlbi_820541_1646235=9pBCfqSeBlaU/rZuCOa5EgAAAACsm/k0oZhXrHPgQUgIdx2f; __utmc=1; 359_MVT=Production; incap_ses_677_820541=TuyqcgMGP0m7jlSD9TBlCfUdSlwAAAAAxKFz0lmbGa9NKShPYAlkCQ==; incap_ses_1002_820541=3YXVWNg2qgVt4jzJPNLnDaBFS1wAAAAAqTH+dOhbKju7PE7ya/m6JA==; incap_ses_530_820541=g8iydJkbsko61boENvFaB4ybTFwAAAAAAUoqmF8+aQR3K0DMuEkVxw==; _fbp=fb.1.1548524429333.920537732; _gid=GA1.2.216334428.1548524430; AMCV_2650037254CC132F0A4C98A6%40AdobeOrg=1099438348%7CMCIDTS%7C17922%7CMCMID%7C61762905408050898680174600361146729466%7CMCAAMLH-1549129230%7C7%7CMCAAMB-1549129230%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT-1548531630s%7CNONE%7CvVersion%7C2.1.0; srchLocation=%7B%22Location%22:%7B%22Address%22:null,%22City%22:%22Burlington%22,%22Latitude%22:43.3377685546875,%22Longitude%22:-79.80254364013672,%22Province%22:%22ON%22,%22PostalCode%22:%22L7R%200B4%22,%22Type%22:%22%22%7D,%22UnparsedAddress%22:%22L7R0B4%22%7D; lastsrpurl=/cars/honda/accord/on/burlington/?rcp=15&rcs=0&srt=3&trim=EX-L%2CEX-L%20w-Navi%2CSport%2CTouring&yRng=2014%2C&pRng=%2C17500&prx=25&prv=Ontario&loc=L7R0B4&trans=Automatic&hprc=True&wcp=True&sts=New-Used&nod=4%2B%20Door&inMarket=advancedSearch; searchState={"isUniqueSearch":false,"make":"Honda","model":"Accord"}; uag=DD83109745972DB18984A1EEEA659BA45124E731A1CA35CF766CEB2C78CDA978; PreviouslyViewedPVs=5-42423732%2c5-42065851%2c5-42244841%2c5-42278097%2c5-41992218%2c19-10936588%2c5-42160387%2c5-42241651%2c5-41965246%2c5-42082483%2c5-37819580%2c5-41702376%2c5-42242890%2c5-41373481%2c5-41778296%2c5-41482594%2c19-10932841%2c19-10932842%2c5-42444524%2c5-42424838%2c5-42293767%2c19-10923281%2c19-10930137%2c5-41790203%2c5-41136493%2c19-10902368%2c19-10918059%2c5-42386690%2c5-42192341%2c5-41718075; searchFlag=true; __utma=1.125112106.1546288407.1548216821.1548525035.9; __utmt=1; __utmb=1.4.9.1548525052914; _4c_=jVNdb9sgFP0rFa9LY8CA7bx1nbRWWtuHds8RH9eJFddYQJZlUf77LmnSLq20DckycM79PuzIZgkDmTEpasklYyVrxISsYBvJbEdC5%2FLvB5kRJhpFTQVClFWtJLeVsla2LeXQmLY1ZEJ%2BZj9SUCXwU6zaT0iwJ%2FsU1vCOU9GGZk46clrdR%2FhAkUixg%2B3%2FSVqE9uQpl0MF5VwKpj6QFZLdcErMQavXfTrLTXLRlAxp3SvrHY6Oa8RbE06M7RujkRzDsypX9%2FzWQqCSNwqXYU4ooZgpuZGWKUMZdUa9OpA1U5yqUuUU7Hi035F1wC6QZUpjnBXFZrOZ6nXyKWgHYWp1oYsbPzhdXFnrg%2Fv0CE4PxWe06oZF8kPxMCQdOl%2FIueCCl1XJ5w%2F3nNKacZZnrxhORBWYh%2FUO8tCbaTWleE6%2F8HRZ0ryHIRczhtySMXi3tmmetmPmb8BcRLdCoIvf%2FGIB7hb7TO7x4kbHa4%2FxbQL3CH0P4QiY4DfxcLpeBv8MFxXDW4%2F6I3fa4jZACyEcGHiKXcqRzgo%2FXqNqz5DLAzLmbEvc9N7qPtui4LPXsdfbeZ7O%2FwwmQoydHw50Km3VWqksVwJkpcHIxijRGFM1lpn8FL5ezb%2FffskdzI%2BKM6qmqEnF6xoV%2BIL%2FDX4KHTYv3EFa%2BhzxCevpEsbXPXnR7h%2Bydah6YnuNCVoHcZX8SPZHKYlaMSZkSfOrTgnlUytB89rv978B'
}

resp = requests.get(
    "https://www.autotrader.ca/",
    # "https://www.autotrader.ca/a/Honda/Accord+Sedan/Burlington/Ontario/5_42423732_ON20081215113610906/",
    headers=headers)

print(resp.status_code)
roschach
  • 8,390
  • 14
  • 74
  • 124
Gaurang Shah
  • 11,764
  • 9
  • 74
  • 137

1 Answers1

0

You can use cookies.get_dict() to get the cookies using requests. If you need the Set-Cookie response sent by the server, it will be present in the response headers.

import requests
s = requests.Session()
r=s.get('http://www.google.com')
sep="\n------------------\n"
print(r.headers,end=sep)
print(r.headers['Set-Cookie'],end=sep)
print(r.cookies.get_dict())

Output

{'Date': 'Sat, 26 Jan 2019 20:59:50 GMT', 'Expires': '-1', 'Cache-Control': 'private, max-age=0', 'Content-Type': 'text/html; charset=ISO-8859-1', 'P3P': 'CP="This is not a P3P policy! See g.co/p3phelp for more info."', 'Content-Encoding': 'gzip', 'Server': 'gws', 'Content-Length': '5360', 'X-XSS-Protection': '1; mode=block', 'X-Frame-Options': 'SAMEORIGIN', 'Set-Cookie': '1P_JAR=2019-01-26-20; expires=Mon, 25-Feb-2019 20:59:50 GMT; path=/; domain=.google.com, NID=156=DqD5DO6OULcovwiJYJF3fFCU6FDUPP9xqCjdIzMVA48TXdk46ZMV-MeJl5Eg_4chXeZHAtKT-WiIEAiRFXSH8SF_riyegpizTr1xQFegMu2dF7rFpCuWnL8IlBhEtp6BYwUHYifWxUzBIQjAnKVbz1_am1j2vW90QsRkNpiDqvw; expires=Sun, 28-Jul-2019 20:59:50 GMT; path=/; domain=.google.com; HttpOnly'}
------------------
1P_JAR=2019-01-26-20; expires=Mon, 25-Feb-2019 20:59:50 GMT; path=/; domain=.google.com, NID=156=DqD5DO6OULcovwiJYJF3fFCU6FDUPP9xqCjdIzMVA48TXdk46ZMV-MeJl5Eg_4chXeZHAtKT-WiIEAiRFXSH8SF_riyegpizTr1xQFegMu2dF7rFpCuWnL8IlBhEtp6BYwUHYifWxUzBIQjAnKVbz1_am1j2vW90QsRkNpiDqvw; expires=Sun, 28-Jul-2019 20:59:50 GMT; path=/; domain=.google.com; HttpOnly
------------------
{'1P_JAR': '2019-01-26-20', 'NID': '156=DqD5DO6OULcovwiJYJF3fFCU6FDUPP9xqCjdIzMVA48TXdk46ZMV-MeJl5Eg_4chXeZHAtKT-WiIEAiRFXSH8SF_riyegpizTr1xQFegMu2dF7rFpCuWnL8IlBhEtp6BYwUHYifWxUzBIQjAnKVbz1_am1j2vW90QsRkNpiDqvw'}

You can also take a look at Requests Session Objects which allows you to persist certain parameters across requests.

import requests
s = requests.Session()
r=s.get("https://www.autotrader.ca/")
print(s.cookies.get_dict())

Output

{'359_MVT': 'Beta', 'incap_ses_427_820268': 'CH+XZmtI+XAoSGd5jgPtBYbSTFwAAAAAisPp/ga12qcaus8OQBy+WQ==', 'incap_ses_427_820541': '5w7HIplKciGGR2d5jgPtBYXSTFwAAAAAtK6JqBFZ7yOfMEQRTxsb4w==', 'nlbi_820541_1646237': '6PXif98z32ITUgWNCOa5EgAAAAB0qBmvDiWBSIKScEbsrrei', 'visid_incap_820268': '8Y9/QUrMSN6ig2Eh8yaQBobSTFwAAAAAQUIPAAAAAAC3xP7V2sSXvYIv1o3+boYi', 'visid_incap_820541': 'GTvhgGUCSPiBzrX555BMD4XSTFwAAAAAQUIPAAAAAADrOrmirYySt7jxsjvAx4e6', '___utmvavlufwBX': 'FRU\x01rTwc', '___utmvbvlufwBX': 'JZt\r\n    XUhORalX: Ltz', '___utmvmvlufwBX': 'ISmGYgcVGsA'}

That being said, i don't think Requests is the tool for the job here. Selenium can be used to scrape these kinds of websites.

Eg. Printing the headline from your commented url

from selenium import webdriver
from time import sleep
driver = webdriver.Firefox()
driver.get('https://www.autotrader.ca/a/Honda/Accord+Sedan/Burlington/Ontario/5_42423732_ON20081215113610906/')
title = driver.find_element_by_css_selector('h1').text
print(title)  

Output

2014 Honda Accord EX-L|SERVICE HISTORY ON FILE - Burlington
Bitto
  • 7,937
  • 1
  • 16
  • 38