I want to scrape this website: https://cage.dla.mil/Home/UsageAgree using Beautiful Soup. What I'm doing:
import requests
url = "https://cage.dla.mil/Home/UsageAgree"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
print(soup)
which returns HTML from a cookie agreement page. What I am then looking for is to bypass this to scrape the content of the actual page once we accept the cookies.
I followed this post: Scraping a webpage using Python (beautiful soup) that requires "I agree to cookies" button being clicked?
and did:
import requests
url = 'https://cage.dla.mil/'
s = requests.Session()
s.cookies.update({'agree': 'True'})
s.get(url)
soup = BeautifulSoup(r.content, "html.parser")
print(soup)
but I'm still getting the agreement page.
It seems that one of the cookies always gives a unique value. I'm not sure how to deal with this.