I have data which I need to modify using the first entry of a certain Google search. This search has to be repeated for about 300 000 times (each row) with varying search keywords.
I wrote a bash script for that using wget. However after about 30 (synchronous) requests, my queries seem to get blocked.
Connecting to www.google.com (www.google.com)|74.125.24.103|:80... connected. HTTP request sent, awaiting response... 404 Not Found
ERROR 404: Not Found.
I am using this snippet:
wget -qO- ‐‐limit-rate=20k --user-agent='Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0' "http://www.google.de/search?q=wikipedia%20$encodedString"
I am dependent on it to work so I hope someone has experience. It is not a regular job and does not need to be done quickly - it would even be acceptable if the 300000 requests take over a week.