14

When using the MountainCar-v0 environment from OpenAI-gym in Python the value done will be true after 200 time steps. Why is that? Because the goal state isn't reached, the episode shouldn't be done.

import gym
env = gym.make('MountainCar-v0')
env.reset()
for _ in range(300):
    env.render()
    res = env.step(env.action_space.sample())
    print(_)
    print(res[2])

I want to run the step method until the car reached the flag and then break the for loop. Is this possible? Something similar to this:

n_episodes = 10
done = False
for i in range(n_episodes):
    env.reset()
    while done == False:
        env.render()
        state, reward, done, _ = env.step(env.action_space.sample())
needRhelp
  • 2,948
  • 2
  • 24
  • 48

2 Answers2

21

The current newest version of gym force-stops environment in 200 steps even if you don't use env.monitor. To avoid this, use env = gym.make("MountainCar-v0").env

Scitator
  • 2,708
  • 2
  • 10
  • 7
  • How did you know you could add ".env" to the end to get different behaviour? I can't find any info on this anywhere but I'm curious what it is. – Danny Tuppeny Jul 28 '19 at 12:34
  • 1
    Oh, I found this.. the time limit is added as a wrapper, and `.env` accesses the environment that was wrapped: https://github.com/openai/gym/blob/c7f9edf943174387b2336ec7f7bc15e0ecac16f8/gym/envs/registration.py#L110 – Danny Tuppeny Jul 28 '19 at 12:41
9

Copied from https://github.com/openai/gym/wiki/FAQ:

Environments are intended to have various levels of difficulty, in order to benchmark the ability of reinforcement learning agents to solve them. Many of the environments are beyond the current state of the art, so don't expect to solve all of them. (If you do, please apply).

If you want to experiment with a variant of an environment that behaves differently, you should give it a new name so that you won't erroneously compare your agent running on an easy variant to someone else's agent running on the original environment. For instance, the MountainCar environment is hard partly because there's a limit of 200 timesteps after which it resets to the beginning. Successful agents must solve it in less than 200 timesteps. For testing purposes, you could make a new environment MountainCarMyEasyVersion-v0 with different parameters by adapting one of the calls to register found in gym/gym/envs/__init__.py:

gym.envs.register(
    id='MountainCarMyEasyVersion-v0',
    entry_point='gym.envs.classic_control:MountainCarEnv',
    max_episode_steps=250,      # MountainCar-v0 uses 200
    reward_threshold=-110.0,
)
env = gym.make('MountainCarMyEasyVersion-v0')

Because these environment names are only known to your code, you won't be able to upload it to the scoreboard.

catherio
  • 91
  • 1