30

I have recently been working on a project that uses a neural network for virtual robot control. I used tensorflow to code it up and it runs smoothly. So far, I used sequential simulations to evaluate how good the neural network is, however, I want to run several simulations in parallel to reduce the amount of time it takes to get data.

To do this I am importing python's multiprocessing package. Initially I was passing the sess variable (sess=tf.Session()) to a function that would run the simulation. However, once I get to any statement that uses this sess variable, the process quits without a warning. After searching around for a bit I found these two posts: Tensorflow: Passing a session to a python multiprocess and Running multiple tensorflow sessions concurrently

While they are highly related I haven't been able to figure out how to make it work. I tried creating a session for each individual process and assigning the weights of the neural net to its trainable parameters without success. I've also tried saving the session into a file and then loading it within a process, but no luck there either.

Has someone been able to pass a session (or clones of sessions) to several processes?

Thanks.

Community
  • 1
  • 1
MrRed
  • 719
  • 3
  • 9
  • 20
  • 1
    I suspect you can't pass clones of Sessions between processes because there's state in C address space that Python doesn't know how to copy. But creating brand new sessions in each new process should work. I haven't used multiprocessing, but I often have couple of processes open in parallel that keep their own TensorFlow sessions – Yaroslav Bulatov Apr 13 '16 at 22:14
  • The second link I provided runs several processes in parallel, but the issue is that I need the neural network to be the same for ALL the processes. – MrRed Apr 13 '16 at 22:19
  • 2
    You may be able to work-around it by using TensorFlow distributed -- ie, have a local worker and ps and create multiple sessions in parallel like `tf.Session("grpc://localhost:2222")` https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/how_tos/distributed/index.md – Yaroslav Bulatov Apr 13 '16 at 22:53
  • 1
    @YaroslavBulatov , any chance you could elaborate a little bit? I am implementing parallel TRPO (Viva RL) but now working with the minimal example [ https://gist.github.com/dd210/e8bad8eadc19f44cafcdc5313a39a53f ]. Basically I would like to launch Session in main process, do some calculations, then transfer network to another X parallel processes and do some calculations in parallel there. Then repeat Y times. But I got stuck with this minimal example, because it seems that I need to work in one session all the time. Official tutorial is a vague. Appreciate any help and advice. – dd210 Feb 03 '17 at 02:09
  • 1
    Multiprocessing seems to be semi-broken with TensorFlow. Here's an example of using distributed tensorflow -- https://gist.github.com/yaroslavvb/ea1b1bae0a75c4aae593df7eca72d9ca – Yaroslav Bulatov Feb 03 '17 at 02:43
  • BTW, I've seen people use TF with MPI before, it's fine if you launch processes and create sessions in each process separately and use MPI to communicate, as opposed to importing tensorflow -> creating session -> fork – Yaroslav Bulatov Feb 03 '17 at 02:45
  • @YaroslavBulatov Exactly! I was one of such persons, but it is clearly very strange workaround.. And not perfect fit for my task either. Thank you for the example, I will look into it. – dd210 Feb 03 '17 at 02:52

2 Answers2

21

You can't use Python multiprocessing to pass a TensorFlow Session into a multiprocessing.Pool in the straightfoward way because the Session object can't be pickled (it's fundamentally not serializable because it may manage GPU memory and state like that).

I'd suggest parallelizing the code using actors, which are essentially the parallel computing analog of "objects" and use used to manage state in the distributed setting.

Ray is a good framework for doing this. You can define a Python class which manages the TensorFlow Session and exposes a method for running your simulation.

import ray
import tensorflow as tf

ray.init()

@ray.remote
class Simulator(object):
    def __init__(self):
        self.sess = tf.Session()
        self.simple_model = tf.constant([1.0])

    def simulate(self):
        return self.sess.run(self.simple_model)

# Create two actors.
simulators = [Simulator.remote() for _ in range(2)]

# Run two simulations in parallel.
results = ray.get([s.simulate.remote() for s in simulators])

Here are a few more examples of parallelizing TensorFlow with Ray.

See the Ray documentation. Note that I'm one of the Ray developers.

Robert Nishihara
  • 3,276
  • 16
  • 17
  • 3
    Thank you so much for this - I'd like to vouch that this also (maybe obviously) works with Keras. I needed a way to run prediction in multiple processes at once. My specific implementation entailed creating wrapper classes around my Keras models, each having their own `tf.Graph()` and `tf.Session()` objects, and creating a `ray` actor from an entry point module which exposes a single `run_prediction()` method. – David O'Neill Feb 21 '19 at 14:06
14

I use keras as a wrapper with tensorflow as a backed, but the same general principal should apply.

If you try something like this:

import keras
from functools import partial
from multiprocessing import Pool

def ModelFunc(i,SomeData):
    YourModel = Here
    return(ModelScore)

pool = Pool(processes = 4)
for i,Score in enumerate(pool.imap(partial(ModelFunc,SomeData),range(4))):
    print(Score)

It will fail. However, if you try something like this:

from functools import partial
from multiprocessing import Pool

def ModelFunc(i,SomeData):
    import keras
    YourModel = Here
    return(ModelScore)

pool = Pool(processes = 4)
for i,Score in enumerate(pool.imap(partial(ModelFunc,SomeData),range(4))):
    print(Score)

It should work. Try calling tensorflow separately for each process.

June Skeeter
  • 1,142
  • 2
  • 13
  • 27