Using Stable Baselines 3:
Given that deterministic=True
always returns the action with the highest probability, what does that mean for environments where the action space is "box", "multi-binary" or "multi-discrete" where the agent is supposed to select multiple actions at the same time? How does deterministic=True
work in these environments / does it work at all in the way it is supposed to?
The question is partly based on this question about
What does "deterministic=True" in stable baselines3 library means?
and potentially related to another question from me
Reinforcement learning deterministic policies worse than non deterministic policies