4

EDIT: I'm using Python 3.5.0, and so map() will return an iterator instead of a list, unlike Python 2.x

I have a list of units and I am calling a REST api on all of them to return more data about them. I'm using map() to do this, but when I try to convert that map to a list, the program hangs there and doesn't proceed (both when I run it and debug it)

data = list(map(lambda product: client.request(units_url + "/" + product), units))

At first I thought maybe it was an issue with calling the api so quickly, but when I iterate through the map (without converting it to a list) manually and print it goes just fine:

data = map(lambda product: client.request(units_url + "/" + product), units)
for item in data:
    print(item)    # <-- this works just fine for the entire map

Anyone know why I'm getting this behavior?

yiwei
  • 4,022
  • 9
  • 36
  • 54
  • yep! `units` is just a list of strings – yiwei Oct 08 '15 at 19:10
  • Why you are trying to convert list into list again? `data = list(map(...), units))` since as you said units is a list – xiº Oct 08 '15 at 19:10
  • 1
    because in python 3.x+ `map()` returns an iterator instead of a `list`, so i need to convert it back – yiwei Oct 08 '15 at 19:12
  • 1
    try this: `data = list(client.request(units_url + "/" + product) for product in units)` – xiº Oct 08 '15 at 19:18
  • What happpens if you use a list comprehension? Either `[x for x in map(...)]` or `[client.request(...) for product in units]` – tobias_k Oct 08 '15 at 19:19
  • @tobias_k: Thanks for finally mentioning the proper form (`[client.request(...) for product in units]`). Wrapping generator expressions in the `list` constructor as user3990145 is doing is just reinventing the list comprehension, but more slowly (because looking up the name `list` costs overhead that the syntax construction with `[]` does not). That said, as I noted in a comment on an answer, the cost for a map over a list comprehension is at worst microseconds; a single network request costs milliseconds, so `list(map(a, b))` vs. `[a(x) for x in b]` is meaningless. – ShadowRanger Oct 08 '15 at 19:49

2 Answers2

2

When you list-ify the map, that means every single request is dispatched serially, waits for completion, then stores to the resulting list. If you're dispatching 1000 requests, that means each request must complete in order, one by one, before the list is constructed and you see the first result; it's entirely synchronous.

You get results (almost) immediately in the direct map iteration case because it only makes one request at a time; instead of waiting for 1000 requests, it waits for 1, you process that result, then it waits for another, etc.

If the goal is to minimize latency, take a look at multiprocessing.Pool.imap (or the thread based version of the pool implemented in multiprocessing.dummy; threads can be ideal for parallel network I/O requests and won't require pickling data for IPC). With the Pool's map, imap, or imap_unordered methods (choose one based on your needs), the requests will be dispatched asynchronously, several at a time (depending on the number of workers you select). If you absolutely must have a list, Pool.map will usually construct it faster; if you can iterate directly and don't care about the ordering of results, Pool.imap_unordered will get you results as fast as the workers can get them, in whatever order they are satisfied in. Plain map without a Pool isn't getting you any magical performance benefits (a list comprehension would usually run faster actually), so use a Pool.

Simple example code for fastest results:

import multiprocessing.dummy as multiprocessing  # Import thread based version of library; for network I/O should work fine

with multiprocessing.Pool(8) as pool:  # Pool of eight worker threads
    for item in pool.imap_unordered(lambda product: client.request(units_url + "/" + product), units):
        print(item)

If you really need to, you can use Pool.map and store to a real list, and assuming you have the bandwidth to run eight parallel requests (or however many workers you configure the pool for), that should (roughly) divide the time to complete the map by eight.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • For the record, if you don't need order `imap_unordered` is the best possible version, since a single delayed network request won't prevent you from getting other results. With `Pool.map` and `Pool.imap`, if even one network request gets held up, you can end up with a long stall before you see any results at all. – ShadowRanger Oct 08 '15 at 19:45
  • thanks for that great explanation! Does that also account for why it still hangs when I save the `map()` into an intermediate variable `results`, then call `list(results)`? – yiwei Oct 08 '15 at 20:23
  • 1
    Yup. All the `list` constructor does is iterate the generator `map` returns, one at a time, until it stops getting new values. The normal `map` function is lazy; you could save it to a variable, wait 10 minutes, then ask for a value, and it still would have to scramble to get it for you; it hasn't started calculating anything at all. By contrast, the `Pool` equivalent methods (`imap`/`imap_unordered`) start working when you call them; if you delay before retrieving any results, the initial values will come instantly for a while. When you do work between retrievals, they do too. – ShadowRanger Oct 08 '15 at 23:28
  • That's awesome. Thanks! +1 – yiwei Oct 08 '15 at 23:31
-1

Better answer than I previously had. Check out this link. Near the bottom of the answer it gives a great analysis on why you should really use a list comprehension.

data = [ client.request(units_url + "/" + product) for product in units ]

Community
  • 1
  • 1
RobertB
  • 1,879
  • 10
  • 17
  • That will make next to no difference. It is faster, don't get me wrong, but it's faster in the sense of shaving a few microseconds per element processed at best. When we're talking about network requests, the costs typically start in the milliseconds per request (1000x longer); microsecond costs are noise. – ShadowRanger Oct 08 '15 at 19:43
  • Well the big difference is that navigating an iterator doesn't build the list, it just gets the results sequentially. Faster, no building of the big object. If you build the list, you are doing exactly that, getting the results sequentially, building a complex data structure and holding it in memory. So do you really need a list? If not, just stick with the iterator. – RobertB Oct 08 '15 at 19:47
  • And I'm curious... did you try the list comprehension? – RobertB Oct 08 '15 at 19:48