5

I took multiprocessing example for Stable Baselines 3 and everything was fine. https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb#scrollTo=pUWGZp3i9wyf

Multiprocessed training took approximately 3.6x less time than single processing with num_cpu=4.

But when I'm trying to use PPO instead of A3C, and BipedalWalker-v3 instead of CartPole-v1, I see worse performance in multiprocessing mode. My question is: What am I doing wrong? Why is it slower?

My code is:

import gym
import time

from stable_baselines3 import PPO
from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy

env_name = "BipedalWalker-v3"
num_cpu = 4
n_timesteps = 10000

env = make_vec_env(env_name, n_envs=num_cpu)

model = PPO('MlpPolicy', env, verbose=0)

start_time = time.time()
model.learn(n_timesteps)
total_time_multi = time.time() - start_time
print(f"Took {total_time_multi:.2f}s for multiprocessed version - {n_timesteps / total_time_multi:.2f} FPS")


single_process_model = PPO('MlpPolicy', env_name, verbose=0)
start_time = time.time()
single_process_model.learn(n_timesteps)
total_time_single = time.time() - start_time


print(f"Took {total_time_single:.2f}s for single process version - {n_timesteps / total_time_single:.2f} FPS")
print("Multiprocessed training is {:.2f}x faster!".format(total_time_single / total_time_multi))

The output is:

Took 16.39s for multiprocessed version - 610.18 FPS
Took 14.19s for single process version - 704.80 FPS
Multiprocessed training is 0.87x faster!

1 Answers1

0

You can try to pass the SubprocVecEnv class as the vec_env_cls arguments of make_vec_env.

By default make_vec_env uses the DummyVecEnv wrapper to vectorize the environment. This does not actually create subprocesses, but it calls each environment in sequence on the current Python process. It is good for simple environments (such as CartPole), as the overhead of multiprocess/multithread outweighs the environment computation time, but for more computationally heavy environments, SubprocVecEnv is better (as it creates actual subprocesses).