22

I can't find an exact description of the differences between the OpenAI Gym environments 'CartPole-v0' and 'CartPole-v1'.

Both environments have seperate official websites dedicated to them at (see 1 and 2), though I can only find one code without version identification in the gym github repository (see 3). I also checked out the what files exactly are loaded via the debugger, though they both seem to load the same aforementioned file. The only difference seems to be in the their internally assigned max_episode_steps and reward_threshold, which can be accessed as seen below. CartPole-v0 has the values 200/195.0 and CartPole-v1 has the values 500/475.0. The rest seems identical at first glance.

import gym

env = gym.make("CartPole-v1")
print(self.env.spec.max_episode_steps)
print(self.env.spec.reward_threshold)

I would therefore appreciate it if someone could describe the exact differences for me or forward me to a website that is doing so. Thank you very much!

J_H
  • 17,926
  • 4
  • 24
  • 44
Paul Pauls
  • 710
  • 1
  • 4
  • 19

1 Answers1

40

As you probably have noticed, in OpenAI Gym sometimes there are different versions of the same environments. The different versions usually share the main environment logic but some parameters are configured with different values. These versions are managed using a feature called the registry.

In the case of the CartPole environment, you can find the two registered versions in this source code. As you can see in lines 50 to 65, there exist two CartPole versions, tagged as v0 and v1, whose differences are the parameters max_episode_steps and reward_threshold:

register(
    id='CartPole-v0',
    entry_point='gym.envs.classic_control:CartPoleEnv',
    max_episode_steps=200,
    reward_threshold=195.0,
)

register(
    id='CartPole-v1',
    entry_point='gym.envs.classic_control:CartPoleEnv',
    max_episode_steps=500,
    reward_threshold=475.0,
)

Both parameters confirm your guess about the difference between CartPole-v0 and CartPole-v1.

Pablo EM
  • 6,190
  • 3
  • 29
  • 37
  • 1
    Thank you very much Pablo, very helpful answer and well supported! You don't happen also happen to know the exact reason for why the two are different? Though since I now know that the two variables are the only thing that is different is my main concern now cleared. – Paul Pauls Jul 09 '19 at 17:57
  • 1
    Welcome, it's a pleasure to be helpful. Actually I don't know the reason, maybe they both appeared in different research papers. I guess it's possible to investigate the origin of each configuration. – Pablo EM Jul 09 '19 at 18:19