Is the multiprocessing python package working on Google Cloud ml-engine?

Question

I'm trying to parallelize my preprocessing function with the use of the multiprocessing python package (which can be found here: https://docs.python.org/3.4/library/multiprocessing.html?highlight=process).

It works fine on my computer (my 4 CPUs are used), however it seems it's not working when I run my code on a google cloud ml-engine job. The job is taking much more time than the sequential equivalent, and cpu utilisation is going down to almost 0% at some point.

Here is my code attempt:

import multiprocessing as mp

pool = mp.Pool(processes=mp.cpu_count())
params = [ some_params_lists]
pool.starmap(fn_to_run_in_parallel, params)
pool.close()
pool.join()

I also tried using multiprocessing.Process() without any luck.

Configuration of the machine:

ScaleTier = 'CUSTOM'

masterTYpe = 'large_model'

Can you please comment on which machine types you are using in the your ml-engine job? — rhaertel80, Jan 23 '19 at 15:21
Hi, sure. I just edited the question. I use a custom machine with large_model — Benjamin Larrousse, Jan 23 '19 at 15:36
May we assume that nothing else is happening after pool.join()? (Just want to rule out the possibility that code has proceeded beyond the mp code) — rhaertel80, Jan 24 '19 at 16:09

score 0 · Answer 1 · answered Feb 22 '19 at 01:33

0

I don't think Google Cloud ml-engine has anything to do with the slowdown. The code working on your machine will work the same way in Google Cloud VMs.

Multiprocessing does not necessary mean faster processing. In cases where the dispatch overhead overrides the multi processing gain, it will become even slower than single processing.

There're tons of discussions on multiprocessing slower than single processing in stackoverflow. e.g. Python multiprocessing is taking much longer than single processing

I would suggest you add time before and after the processing logic, and run in your machine and in Google cloud ml-engine respectively to get the exact latency in multiprocessing and single processing. e.g.

import time
start = time.time()
#your code in multiprocessing
end = time.time()
print(end - start)

start = time.time()
#your code in single processing
end = time.time()
print(end - start)

Google Cloud ml-engine takes time to ramp up VM which can be the latency you saw. The code above will tell you the exact latency.

answered Feb 22 '19 at 01:33

Bo yang

136
1
3

You're saying: "The code working on your machine will work the same way in Google Cloud VMs." But how can you be sure ? I'm saying exactly that it is not. The single processing is much slower than the multiprocessing in my computer, however I can not reproduce the same thing on Cloud ml-engine. – Benjamin Larrousse Feb 22 '19 at 16:31
Do you mind adding the time code above to measure the exact latency? What's the memory size of your own machine? Can you share your processing logic? Given same memory and CPU, the results should be the same when executing in your machine vs. in cloud. – Bo yang Feb 22 '19 at 17:57
I know it should be. I already did add the time code, but the process did not terminate (I had to do it manually). With an other example (and with pool.imap) the code was 4 times faster with multiprocessing on 4 cores (in my machine) than single processing, but in the cloud it was the same amount of time as single processing. My machine is 16Go memory with 16go swap (the cloud ml-engine machine is bigger than that) – Benjamin Larrousse Feb 22 '19 at 18:02
What did you mean by the process did not terminate and you had to terminate manually? It doesn't seem right that the process didn't terminate. I tried multiprocessing in my machine and cloud, both ended with similar latency. – Bo yang Feb 22 '19 at 19:59
Can you share your whole code? I'll try to run on my end. Did Google cloud ml-engine print any of your logs? – Bo yang Feb 22 '19 at 20:00
It means nothing was going on, the cpu usage went down to almost 0 and nothing happened. I can not share the whole code because it's too many things to share – Benjamin Larrousse Feb 25 '19 at 08:40

score -1 · Answer 2 · edited Jul 27 '22 at 00:40

-1

It is hard to answer since no real question is at hand and furthermore no code to review. My guess is that your code gets stuck in the processes and never join()'s, that's probably why CPU goes down to 0%.

edited Jul 27 '22 at 00:40

halfer

19,824
17
99
186

answered Jul 22 '22 at 10:00

niCk cAMel

869
1
10
26

Is the multiprocessing python package working on Google Cloud ml-engine?

2 Answers2