This is basically a duplicate of Is there a way to implement an OpenAI's environment, where the action space changes at each step?, but I used the provided answer and I found, that the
@property
def action_space(self):
print("Action space is being called")
if self.for_valids:
borders = self.Map.valid_action_space +1
return MultiDiscrete(borders)
else:
return MultiDiscrete(np.array([3]*90, dtype=int) - np.array([0,0,1,0,0]*18, dtype=int))
method I defined is not called during the learn method of the model (I used PPO), but only once during the
model = PPO('MlpPolicy', env, verbose=1, tensorboard_log = log_path)
setup. I tested the method with random steps from env.action_space and no "illegal" actions are taken there, but plenty are taken after a few steps in the learning routine.
I'm also completely new to A.I. reinforcement learning, so I'm also curious whether it's smart to define a variable action space regardless of whether it's possible.
Thanks