I took multiprocessing example for Stable Baselines 3 and everything was fine. https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb#scrollTo=pUWGZp3i9wyf
Multiprocessed training took approximately 3.6x less time than single processing with num_cpu=4.
But when I'm trying to use PPO instead of A3C, and BipedalWalker-v3 instead of CartPole-v1, I see worse performance in multiprocessing mode. My question is: What am I doing wrong? Why is it slower?
My code is:
import gym
import time
from stable_baselines3 import PPO
from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
env_name = "BipedalWalker-v3"
num_cpu = 4
n_timesteps = 10000
env = make_vec_env(env_name, n_envs=num_cpu)
model = PPO('MlpPolicy', env, verbose=0)
start_time = time.time()
model.learn(n_timesteps)
total_time_multi = time.time() - start_time
print(f"Took {total_time_multi:.2f}s for multiprocessed version - {n_timesteps / total_time_multi:.2f} FPS")
single_process_model = PPO('MlpPolicy', env_name, verbose=0)
start_time = time.time()
single_process_model.learn(n_timesteps)
total_time_single = time.time() - start_time
print(f"Took {total_time_single:.2f}s for single process version - {n_timesteps / total_time_single:.2f} FPS")
print("Multiprocessed training is {:.2f}x faster!".format(total_time_single / total_time_multi))
The output is:
Took 16.39s for multiprocessed version - 610.18 FPS
Took 14.19s for single process version - 704.80 FPS
Multiprocessed training is 0.87x faster!