I've implemented several reinforcement learning algorithms using Tensorflow. For continuous methods(e.g. DDPG, TD3), I ran them on BipedalWalker; for discrete methods(e.g. DQN, Rainbow), I ran them on Atari games.
I intentionally split training and environment interaction by training the networks in a background thread(via the standard python module threading.Thread
). I found this sped up those methods running on BipedalWalker, but it harmed those on Atari games. What makes these differences? Does it have something to do with python's GIL?
One reason I can conjecture is that background learning increases the learning frequency, which makes it more likely overfit and get stuck at a local optimum.