0

This is basically a duplicate of Is there a way to implement an OpenAI's environment, where the action space changes at each step?, but I used the provided answer and I found, that the

@property
def action_space(self):
    print("Action space is being called")
    if self.for_valids: 
        borders = self.Map.valid_action_space +1
        return MultiDiscrete(borders)
    else:
        return MultiDiscrete(np.array([3]*90, dtype=int) - np.array([0,0,1,0,0]*18, dtype=int))

method I defined is not called during the learn method of the model (I used PPO), but only once during the

model = PPO('MlpPolicy', env, verbose=1, tensorboard_log = log_path)

setup. I tested the method with random steps from env.action_space and no "illegal" actions are taken there, but plenty are taken after a few steps in the learning routine.

I'm also completely new to A.I. reinforcement learning, so I'm also curious whether it's smart to define a variable action space regardless of whether it's possible.

Thanks

Rubén
  • 34,714
  • 9
  • 70
  • 166

0 Answers0