3

For the life of me I don't get what "num_envs_per_worker" does. If the limiting factor is policy evaluation why would we need to create multiple environments? Wouldn't we need to create multiple policies?

ELI5 please?

The docs say:

Vectorization within a single process: Though many envs can achieve high frame rates per core, their throughput is limited in practice by policy evaluation between steps. For example, even small TensorFlow models incur a couple milliseconds of latency to evaluate. This can be worked around by creating multiple envs per process and batching policy evaluations across these envs. You can configure {"num_envs_per_worker": M} to have RLlib create M concurrent environments per worker. RLlib auto-vectorizes Gym environments via VectorEnv.wrap().

Src: https://ray.readthedocs.io/en/latest/rllib-env.html

Andriy Drozdyuk
  • 58,435
  • 50
  • 171
  • 272
  • Hi, I'm curious as well. Would "...batching policy evaluations across these envs." be a cost saving factor? – Huan Apr 07 '20 at 04:37

1 Answers1

3

Probably a bit late on this, but here's my understanding:

  • as the docs you cited mention, there's significant fixed per-call overhead in using TensorFlow (converting data into the appropriate structures, overhead and coordination of passing data to the GPU, etc)
  • however, you can call a TensorFlow model with a batch of data, and the execution time required generally scales nicely. It should scale linearly at the limit, and when going from a single row to a few rows, it might actually scale sub-linearly. E.g. if you are going to pass 1 row of data to a vector processing unit like a GPU (or specialised CPU instructions), you might as well pass as many rows as it can handle it one go, it won't actually take any more time. (those parallel execution units would just have been sitting idle otherwise)
  • therefore, you want to batch up rows of data so that you only pay the fixed per-call cost as infrequently as necessary. One way of doing this is by having several RL environments executing in lockstep. Maybe you have 8 of them, and each of these 8 environments produces its own observation, and then you take these 8 observations and call your TensorFlow model once on this batch of 8 observations, to produce 8 new actions, which you then use to produce 8 new observations, etc. Amortised, this will hopefully be only 1/8th the TensorFlow evaluation cost if each of these environments was making its own TensorFlow calls.
Andrew Rosenfeld
  • 1,711
  • 1
  • 12
  • 9