I am trying to do some web-scraping for a project for my study. Unfortunately I need to try and scrape some data of Google Scholar which blocks my requests. I have tried using (multiple) http proxies but my requests still get blocked after ~300 tries.
The resulting html from the blocked requests contains:
IP address: 145.109...<br/>Time: 2016-05-05T09:23:37Z<br/>URL:
https://scholar.google.nl/citations?hl=en&view_op=search_authors
&mauthors=Perry<br/>
The above IP is my own, while my proxies dict (it selects a proxy from a list at random) and get request look like this:
proxies = {'http': 'http://<username>:<password>@107.182....:<port>'}
result = requests.get('https://scholar.google.nl/citations?hl=en&
amp;view_op=search_authors&mauthors=Perry',
proxies=proxies, headers=headers)
The IPs of are of course valid and work and my own ip is not included in the proxy list. Am I doing something wrong?
Edit: For completeness, i have also tried setting authentication like this answer suggests but the result is the same.