Optimise network bound multiprocessing code

Question

I have a function I'm calling with multiprocessing.Pool

Like this:

from multiprocessing import Pool

def ingest_item(id):
    # goes and does alot of network calls
    # adds a bunch to a remote db
    return None

if __name__ == '__main__':
    p = Pool(12)
    thing_ids = range(1000000)
    p.map(ingest_item, thing_ids)

The list pool.map is iterating over contains around 1 million items, for each ingest_item() call it will go and call 3rd party services and add data to a remote Postgresql database.

On a 12 core machine this processes ~1,000 pool.map items in 24 hours. CPU and RAM usage is low.

How can I make this faster?

Would switching to Threads make sense as the bottleneck seems to be network calls?

Thanks in advance!

score 2 · Answer 1 · edited May 23 '17 at 11:43

2

First: remember that you are performing a network task. You should expect your CPU and RAM usage to be low, because the network is orders of magnitude slower than your 12-core machine.

That said, it's wasteful to have one process per request. If you start experiencing issues from starting too many processes, you might try pycurl, as suggested here Library or tool to download multiple files in parallel

This pycurl example looks very similar to your task https://github.com/pycurl/pycurl/blob/master/examples/retriever-multi.py

edited May 23 '17 at 11:43

Community

1
1

answered Jul 30 '15 at 19:08

Neal Ehardt

10,334
9
41
51

I can't go and alter the code within the function being called easily. It performs a lot of tasks and checks, the network calls are also done through packages on PyPi i don't want to spend weeks editing just for this task. (id rather rent a higher core machine than do that) – Pythonsnake99 Jul 30 '15 at 19:11
I should also mention the issue isn't the amount of processes, I'm using all 12 on the machine just fine. Nor is network bandwidth anywhere near full. In an ideal world I'd magically add 100 cores to the machine, but this is expensive. – Pythonsnake99 Jul 30 '15 at 19:13
1

You can execute more than 12 processes on a 12-core machine. Try 100 and see what happens. – Neal Ehardt Jul 30 '15 at 19:14
Eventually, you will stop seeing gains from increasing process count. That likely means you've flooded the network. At that point, consider deploying to a machine with a faster connection. – Neal Ehardt Jul 30 '15 at 19:17
do you have a link relating to this? I've been using `multiprocessing.cpu_count()` so far. edit: also the machine has 1gb/s, only using around 1-2mb/s currently – Pythonsnake99 Jul 30 '15 at 19:18
@Pythonsnake99: If each thread is bottlenecked on network latency, not local CPU availability, then the number of logical CPUs isn't really relevant to the number of threads you should start. You've already said CPU utilization is low while your code is running. If you're bottlenecked on remote CPU or disk, you won't see any gains from having more requests active concurrently. – Peter Cordes Jul 31 '15 at 03:18

NitrogenReaction · Answer 2 · 2015-07-30T19:53:03.117

1

It is unlikely that using threads will substantially improve performance. This is because no matter how much you break up the task all requests have to go through the network.

To improve performance you might want to see if the 3rd party services have some kind of bulk request API with better performance.

If your workload permits it you could attempt to use some kind of caching. However, from your explanation of the task it sounds like that would have little effect since you're primarily sending data, not requesting it. You could also consider caching open connections (If you aren't already doing so), this helps avoid the very slow TCP handshake. This type of caching is often used in web browsers (Eg. Chrome).

Disclaimer: I have no Python experience

edited Jul 30 '15 at 19:53

answered Jul 30 '15 at 19:10

NitrogenReaction

336
1
9

I've already implemented caching, however that's only useful on duplicate requests (which does happen, but doesn't speed it up the amount I need). To do the bulk api requests option i'd have to create a Pool within a Pool, which isn't a good idea. – Pythonsnake99 Jul 30 '15 at 19:17
Perhaps the bottleneck is related to network latency, that can be minimized by keeping connections open so you can avoid excessive TCP handshakes caused by the connection being re-opened on every call to ingest_item. I am unsure if Python does this by default though, but it is something to look into. – NitrogenReaction Jul 30 '15 at 19:28

Optimise network bound multiprocessing code

2 Answers2