4

I have only one gpu, and I want to run many actors on that gpu. Here's what I do using ray, following https://ray.readthedocs.io/en/latest/actors.html

  1. first define the network on gpu
class Network():
    def __init__(self, ***some args here***):
        self._graph = tf.Graph()
        os.environ['CUDA_VISIBLE_DIVICES'] = ','.join([str(i) for i in ray.get_gpu_ids()])
        with self._graph.as_default():
            with tf.device('/gpu:0'):
                # network, loss, and optimizer are defined here

        sess_config = tf.ConfigProto(allow_soft_placement=True)
        sess_config.gpu_options.allow_growth=True
        self.sess = tf.Session(graph=self._graph, config=sess_config)
        self.sess.run(tf.global_variables_initializer())
        atexit.register(self.sess.close)

        self.variables = ray.experimental.TensorFlowVariables(self.loss, self.sess)
  1. then define the worker class
@ray.remote(num_gpus=1)
class Worker(Network):
    # do something
  1. define the learner class
@ray.remote(num_gpus=1)
class Learner(Network):
    # do something
  1. train function
def train():
    ray.init(num_gpus=1)
    leaner = Learner.remote(...)
    workers = [Worker.remote(...) for i in range(10)]
    # do something

This process works fine when I don't try to make it work on gpu. That is, it works fine when I remove all with tf.device('/gpu:0') and (num_gpus=1). The trouble arises when I keep them: It seems that only learner is created, but none of the workers is constructed. What should I do to make it work?

Maybe
  • 2,129
  • 5
  • 25
  • 45

2 Answers2

7

When you define an actor class using the decorator @ray.remote(num_gpus=1), you are saying that any actor created from this class must have one GPU reserved for it for the duration of the actor's lifetime. Since you have only one GPU, you will only be able to create one such actor.

If you want to have multiple actors sharing a single GPU, then you need to specify that each actor requires less than 1 GPU, for example, if you wish to share one GPU among 4 actors, then you can have each actor require 1/4th of a GPU. This can be done by declaring the actor class with

@ray.remote(num_gpus=0.25)

In addition, you need to make sure that each actor actually respects the limits that you are placing on it. For example, if you want declare an actor with @ray.remote(num_gpus=0.25), then you should also make sure that TensorFlow uses at most one quarter of the GPU memory. See the answers to How to prevent tensorflow from allocating the totality of a GPU memory? for example.

Robert Nishihara
  • 3,276
  • 16
  • 17
  • Thanks, it works. May I further ask you a question? I spot that the `learner` actually allocates less GPU memory than a worker does, what make these difference? A worker is responsible for interacting with a `gym` environment, computing gradients and sending it to the learner. The learner applies gradients and returns network weights to workers – Maybe Feb 03 '19 at 11:29
  • 1
    @SherwinChen, that's tough to say without seeing the definitions. However, it's plausible that the gradient computation requires more GPU memory than anything else because the activations from the forward pass need to be saved so they can be used by the backward pass. This often takes up a bunch of memory. – Robert Nishihara Feb 04 '19 at 20:24
  • @RobertNishihara, if you do `@ray.remote(num_gpus=0.25)`, will the computations of the various actors be truly parallel (run at the same time on different cuda cores), or will they be time-sliced? I though NVIDIA GPU couldn't run multiple processes in parallel (unless if using MIG or MPS) https://stackoverflow.com/questions/31643570/running-more-than-one-cuda-applications-on-one-gpu – Olivier Cruchant Dec 09 '21 at 17:17
  • @OlivierCruchant Ray won't quite do either of those. Ray will simply allow 4 such tasks to be scheduled on that GPU. However, it is up to the function itself to limit its memory usage (or other usage). This is typically done through a library like TensorFlow or PyTorch. – Robert Nishihara Dec 15 '21 at 04:51
  • 1
    I needed to load 16 model actors in a 2 GPU setting on a single node. When I set @ray.init(num_gpus=2) and @ray.remote(num_gpus=0.125) this configuration is loading models only on one GPU and not on the second GPU. How do I make sure both my GPUs are utilized when loading multiple actors in a fractional gpu setting? – Anirudh Gupta May 17 '22 at 17:28
  • 1
    @AnirudhGupta a few thoughts. Could some of the tasks be finishing quickly before the other tasks start so that both GPUs are not needed? Inside of the tasks, can you check `os.environ["CUDA_VISIBLE_DEVICES"]` to confirm that sometimes it is 0 and sometimes it is 1? That is how Ray controls which GPU is used. – Robert Nishihara May 18 '22 at 18:15
  • @AnirudhGupta, when you have several GPU's what worked for me is the following: As Ray is actually working with logical and not physical GPU's, you can safely init ray with as much GPU's as tasks you want to run, in your case `ray.init(num_gpus=16)`. Then, you have to enable memory_growth on each of the GPUs. `physical_devices = tf.config.list_physical_devices('GPU')` for each device do: `tf.config.experimental.set_memory_growth(device, True)` Finally, each actor should be decorated with `ray.remote(num_gpus=1)`. – Kuzman Belev Sep 01 '23 at 19:20
0

In case anyone wants to run Ray on multi-GPU system and parallelly run TensorFlow functionality, one can approach the problem as follows. Say you have 2 GPUs and want to run 16 actors.

  1. Ray resources are logical and don’t need to have 1-to-1 mapping with physical resources, therefore one can safely init ray as follows:
ray.init(num_gpus=16)
  1. Configure memory growth for each each physical GPU:
physical_devices = tf.config.list_physical_devices('GPU')
for device in physical_devices:
    tf.config.experimental.set_memory_growth(device, True)
  1. Define the actors and decorate with ray.remote(num_gpus=1):
@ray.remote(num_gpus=1)
    class Simulator:
        def __init__(self, i, *args, **kwargs):
            # do init

        def simulate(
            self,
        ):
            # check the ray logical GPU assigned, it will be in the range [0, 15]
            gpu_id = ray.get_gpu_ids()[0]
            print("gpu_id:", gpu_id)
            # do the tf magic you want
            # ...
  1. Create and run the actors:
# Create actors
    simulators = [Simulator.remote(i, args, kwargs) for i in range(16)]
    
    # Run simulations in parallel
    results = ray.get([s.simulate.remote() for s in simulators])

Both GPUs will be utilized and the actors will run parallelly.

This setup works for me with TensorFlow 2.10.1 and Ray 2.6.1.

Kuzman Belev
  • 61
  • 1
  • 7