3

I am facing an issue with google agreement page cookies after scraping on a redirect google url.

I am trying to scrape from different pages on Google News uri, but when i run this code:

req = requests.get(url,headers=headers)

with "headers" = {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-US) AppleWebKit/534.1 (KHTML, like Gecko) Chrome/6.0.422.0 Safari/534.1', 'Upgrade-Insecure-Requests': '1', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'DNT': '1', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'it-IT'}
and for example URL = https://news.google.com/./articles/CAIiEMb3PYSjFFVbudiidQPL79QqGQgEKhAIACoHCAow-ImTCzDRqagDMKiIvgY?hl=it&gl=IT&ceid=IT%3Ait 

the "request.content" is the HTMLs code of agreement cookies page by Google.

I have tried also to convert the redirect link into a normal link but the response gives me the redirect link to this

I have the same problem related to this question (How can I bypass a cookie agreement page while web scraping using Python?).

Anyway, the solution proposed in that works only for the specific site.

Note: the entire code worked until few weeks ago.

Macintosh_89
  • 664
  • 8
  • 24

1 Answers1

2

I solved the problem by adding the line

'Cookie':'CONSENT=YES+cb.20210418-17-p0.it+FX+917; '

to the request header.

Although the page returned by the request is still a Google page, but that page contains the link to the site from which the request originated.

So, once I got the page I did some more scraping so that I could get the link and start the request I wanted.