In Gym + TF-Agents context, how to make actions space as a function of the current state masking impossible actions

Question

I saw other old posts spinning around this topic, but the general answers given were like "don't care, let the NN learn that, in that specific state, it cannot take some actions punishing it!". Well, I don't like it! For several reasons:

many publications speak about the action space not as A, but as A(s). So it is normal to consider the action space as a function of the current state s.

in reality if you have a wall on your left it is not just a matter of hurting yourself trying to pass it, you simply don't have this option. I cannot understand why my RL should still have a chance, even if not probable, to go left after the training

why should we accept extra learning effort to let the RL agent learn something that is already known?

just to mention some of the reasons. I saw that the Discrete object defined in the Gymnasium library has a masking array in order to declare which actions are available, but I can see it just in the random sampling function

def sample(self, mask: Optional[np.ndarray] = None) -> int:

"""Generates a single random sample from this space.
A sample will be chosen uniformly at random with the mask if provided

Rather than this, I think that implementing a "dynamic" actions space as a function of the current state should impact somehow during the training process on the agent.collect_policy. I am struggling on finding complete and working examples of how to implement such a simple capability. That is not so simple to me in the end and I would like to understand if there are not well-documented (as many other things regretfully) already developed and elegant solutions in the TF-Agents / TensorFlow context.

In Gym + TF-Agents context, how to make actions space as a function of the current state masking impossible actions

0 Answers0