1

I wrote the following code:

import eventlet
import requests
import redis

redis = redis.StrictRedis(host="localhost", port="6379", db=0)

proxy_1_pool = eventlet.GreenPool(40)

def fetch_items():
    for _ in range(0, 400):
        proxy_1_pool.spawn(fetch_listing)

    proxy_1_pool.waitall()

def fetch_listing():
    logger.info("START fetch: " + str(datetime.utcnow()))
    url_info = redis.spop("listings_to_crawl")
    content = make_request(url_info)
    logger.info("END fetch: " + str(datetime.utcnow()))
    if content:
        do_something(content)

def make_request(url_info):
    r = requests.get(url_info)
    return r.content

def main():
    fetch_items()

Unfortunately I see that fetch_listing is being involved sequentially.

It would always print:

START
END
START 
END

While I would expect to see:

START
START
END 
END
Dejell
  • 13,947
  • 40
  • 146
  • 229

1 Answers1

1

What's going on:

  • you asked eventlet to execute multiple fetch_listing() concurrently. Parallel as in question title is not going to happen ever, forget about it. And it did as ordered, you can verify by putting eventlet.sleep() right after logger.info...START
  • then execution was blocked by redis.spop and requests.get.

What you do to make blocking code cooperate with eventlet: patching or offload to threadpool.

-import eventlet
+import eventlet ; eventlet.monkey_patch()

Very related questions, highly recommend to read:

temoto
  • 5,394
  • 3
  • 34
  • 50
  • Thanks. just wondering regarding eventlet.GreenPool(40) - does that mean that max 40 requests will occur at the same time? because it seems to me that many more actually happen - as my proxy server is limited to 40 – Dejell Nov 28 '17 at 05:41
  • `GreenPool(40)` should work as you expect. If that's not the case, please gather some data or reproduction script and post issue at https://github.com/eventlet/eventlet – temoto Nov 29 '17 at 09:11