2

Update: Problem was incomplete documentation, event dispatcher passing kwargs to the hook function.

I have a list of about 30k URLs that I want to check for various strings. I have a working version of this script using Requests & BeautifulSoup, but it doesn't use threading or asynchronous requests so it's incredibly slow.

Ultimately what I would like to do is cache the html for each URL so I can run multiple checks without making redundant HTTP requests to each site. If I have a function that will store the html, what's the best way to asynchronously send the HTTP GET requests and then pass the response objects?

I've been trying to use Grequests (as described here) and the "hooks" parameter, but I'm getting errors and the documentation doesn't go very in-depth. So I'm hoping someone with more experience can shed some light.

Here's a simplified example of what I'm trying to accomplish:

import grequests

urls = ['http://www.google.com/finance','http://finance.yahoo.com/','http://www.bloomberg.com/']

def print_url(r):
    print r.url

def async(url_list):
    sites = []
    for u in url_list:
        rs = grequests.get(u, hooks=dict(response=print_url))
        sites.append(rs)
    return grequests.map(sites)

print async(urls)

And it produces the following TypeError:

TypeError: print_url() got an unexpected keyword argument 'verify'
<Greenlet at 0x32803d8L: <bound method AsyncRequest.send of <grequests.AsyncRequest object at 0x00000000028D2160>>
(stream=False)> failed with TypeError

Not sure why it's sending 'verify' as a keyword argument by default; it would be great to get something working though, so if anyone has any suggestions (using grequests or otherwise) please share :)

Thanks in advance.

Kernel Panic
  • 55
  • 2
  • 6

1 Answers1

11

I tried your code and could get it work by adding an additional parameter kwargs to your print_url function.

def print_url(r, **kwargs):
    print r.url

I figured what was wrong in this other stackoverlow question: Problems with hooks using Requests Python package.

It seems when you use the response hook in grequests you need to add **kwargs in your callback definition.

Community
  • 1
  • 1
Matias
  • 527
  • 1
  • 4
  • 19
  • 1
    Thanks! I probably searched 20 different phrases before asking this question but I never saw that one. The developer should probably update the docs. – Kernel Panic Jul 31 '13 at 18:55