0

I have a program which bottleneck is API calls, so I want to make the API calls to be done at the same time. In pseudocode, this is what I would like:

from multiprocessing import Process, Manager

urls = ['www.example.com/item/1', 'www.example.com/item/2', 'www.example.com/item/3']

def get_stats(url, d):
    data = http.get(url)
    d[data['name']] = data['data']

manager = Manager()

d = manager.dict()

for url in urls:
    p = Process(target=get_stats, args=(url, d))
    p.start()
    p.join()

print d

The only thing is that these processes don't seem to be running in parallel.

Is it because I am placing the join() after starting the process?

What is the best way to implement this?

Maksim Solovjov
  • 3,147
  • 18
  • 28
Brandon Swallow
  • 171
  • 1
  • 9

1 Answers1

1

these processes don't seem to be running in parallel

The join() inside your "starter loop" waits for each process to terminate before starting the next one.

Try something like this, instead:

procs = []
for url in urls:
    p = Process(target=get_stats, args=(url, d))
    p.start()
    procs.append(p)

for p in procs:
    p.join()

You might also want to have a look at the answer to Pool with worker Processes, as for your workload, using a process Pool seems like a good idea.

Community
  • 1
  • 1
dhke
  • 15,008
  • 2
  • 39
  • 56
  • thank you, the last for loop there, wont that now also make each subsequent process wait for the next one in the loop – Brandon Swallow Sep 14 '15 at 15:39
  • @BrandonSwallow The last loop makes the main process wait for all the child processes, one by one. The children still run in parallel before and during that. – dhke Sep 15 '15 at 07:09