I'm trying to implement MCTS on Openai's atari gym environments, which requires the ability to plan: acting in the environment and restoring it to a previous state. I read that this can be done with the ram version of the games:
recording the current state in a snapshot:
snapshot = env.ale.cloneState()
restoring the environment to a specific state recorded in snapshot:
env.ale.restoreState(snapshot)
so I tried using the ram version of breakout:
env = gym.make("Breakout-ram-v0")
env.reset()
print("initial_state:")
plt.imshow(env.render('rgb_array'))
env.close()
# create first snapshot
snap0 = env.ale.cloneState()
executing the code above shows the image of the start of the game. We recorded the first state with snap0. Now let's play until the end:
while True:
#is_done = env.ale.act(env.action_space.sample())[2]
r = env.ale.act(env.action_space.sample())
is_done = env.ale.game_over()
if is_done:
print("Whoops! We died!")
break
print("final state:")
plt.imshow(env.render('rgb_array'))
executing the code above shows the image of the end of the game. now let's load the first state again to the environment:
env.ale.restoreState(snap0)
print("\n\nAfter loading snapshot")
plt.imshow(env.render('rgb_array'))
Instead of showing me the image of the start of the game, it shows me the same image of the end of the game. The environment is not reverting back even though I loaded the original first state.
If anyone got to work with ale and recording these kind of states, I'd really appreciate the help in figuring out what am I doing wrong. Thanks!