Bypassing intrusive cookie statement with requests library

Question

I'm trying to crawl a website using the requests library. However, the particular website I am trying to access (http://www.vi.nl/matchcenter/vandaag.shtml) has a very intrusive cookie statement.

I am trying to access the website as follows:

from bs4 import BeautifulSoup as soup
import requests
website = r"http://www.vi.nl/matchcenter/vandaag.shtml"
html = requests.get(website, headers={"User-Agent": "Mozilla/5.0"})
htmlsoup = soup(html.text, "html.parser")

This returns a web page that consists of just the cookie statement with a big button to accept. If you try accessing this page in a browser, you find that pressing the button redirects you to the requested page. How can I do this using requests?

I considered using mechanize.Browser but that seems a pretty roundabout way of doing it.

score 1 · Accepted Answer · answered Aug 31 '16 at 11:41

Try setting:

cookies = dict(BCPermissionLevel='PERSONAL')
html = requests.get(website, headers={"User-Agent": "Mozilla/5.0"}, cookies=cookies)

This will bypass the cookie consent page and will land you staight to the page.

Note: You could find the above by analyzing the javascript code that is run on the cookie concent page, it is a bit obfuscated but it should not be difficult. If you run into the same type of problem again, take a look at what kind of cookies does the javascript code that is executed upon a event's handling sets.

score -1 · Answer 2 · edited May 23 '17 at 12:16

-1

I have found this SO question which asks how to send cookies in a post using requests. The accepted answer states that the latest build of Requests will build CookieJars for you from simple dictionaries. Below is the POC code included in the original answer.

import requests

cookie = {'enwiki_session': '17ab96bd8ffbe8ca58a78657a918558'}

r = requests.post('http://wikipedia.org', cookies=cookie)

edited May 23 '17 at 12:16

Community

1
1

answered Aug 31 '16 at 11:25

Koga

523
4
13

The POC or OPs' website? – Koga Aug 31 '16 at 11:32

Bypassing intrusive cookie statement with requests library

2 Answers2

Linked