I am very new to all of this; I need to obtain data on several thousand sourceforge projects for a paper I am writing. The data is all freely available in json format at the url http://sourceforge.net/api/project/name/[project name]/json. I have a list of several thousand of these URL's and I am using the following code.
import grequests
rs = (grequests.get(u) for u in ulist)
answers = grequests.map(rs)
Using this code I am able to obtain the data for any 200 or so projects I like, i.e. rs = (grequests.get(u) for u in ulist[0:199])
works, but as soon as I go over that, all attempts are met with
ConnectionError: HTTPConnectionPool(host='sourceforge.net', port=80): Max retries exceeded with url: /api/project/name/p2p-fs/json (Caused by <class 'socket.gaierror'>: [Errno 8] nodename nor servname provided, or not known)
<Greenlet at 0x109b790f0: <bound method AsyncRequest.send of <grequests.AsyncRequest object at 0x10999ef50>>(stream=False)> failed with ConnectionError
I am then unable to make any more requests until I quit python, but as soon as I restart python I can make another 200 requests.
I've tried using grequests.map(rs,size=200)
but this seems to do nothing.