0

sorry for the title i know it's a bit vague but i'm having a hard time with our design and I need help !

So we have a trained model, which we wanna use on images for car detection. We have a lot a images coming from multiple camera in our nodejs backend. What we are looking to do is to create multiple workers (child_process) and then send an image path via stdin to every single one of them so they can process it and get the results (1 image per worker per run).

Workers are python3 scripts, so they all run the same code. This mean we have multiple tensorflow session. That created a problem, it looks like i can't find a way to run multiple session on the same gpu... Is there a way to do this ?

If not, how can i achieve my goal to run those images in a parallel way with only 1 gpu ? Maybe i can create 1 session and attache to it in my workers ? I'm very new to this as you can see !

Btw i'm running all of this in a docker container with a gtx 960M (yes i know.. better than nothing i guess).

WoofWoofDude
  • 617
  • 1
  • 7
  • 13

1 Answers1

1

By default, a tensorflow session will hog all GPU memory. You can override the defaults when creating the session. From this answer:

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

That said, graph building/session creation is much more expensive than just running inference on a session, so you don't want to have to do that for each individual query image. You may be better off running a server that builds the graph, starts the session, loads variables etc. then responds to queries as they come in. If you want it more asynchronous than this, you can still have multiple servers with a session in each on the same GPU using the above method.

Check out tensorflow serving for a lot more on this.

DomJack
  • 4,098
  • 1
  • 17
  • 32
  • I'm gonna try to split the memory, but how should much should i give each instance ? it's hard to tell, right now with a single worker it is set to 1.6Go and i'm limited to 2Go with this computer ? For the 2nd part, workers are already only loading and starting the stuff at the start of the process, so they are juste waiting for data to come in. – WoofWoofDude Apr 15 '18 at 03:45
  • Thanks it worked to split the memory. Ran some tests and with about 270 Mo each they are able to run.. But there is a problem.. i think i max out my gpu with only 2 workers.. 1=50ms, 2=80ms, 4=130ms, 6=180ms – WoofWoofDude Apr 15 '18 at 04:20
  • I found similar behaviour when I tried training multiple models simultaneously. If you're doing lots of CPU preprocessing the bottleneck may be there? I haven't done anything with tensorflow serving, but I'd definitely check it out - sounds like this is exactly what it was designed for. – DomJack Apr 16 '18 at 01:12