1

I am trying to develop a custom gym environment for a Reinforcement Learning Use case. In this environment my main aim is to predict the state based on several action that are to be taken in each step i.e. simply my observation_space was dependent on multiple actions in the action_space. I tried providing several actions to the environment within a Tuple of different Box spaces values as shown below:

self.action_space = Tuple([Box(low=np.array([22]),high=np.array([25])),
                           Box(low=np.array([0]), high=np.array([230])),
                           Box(low=np.array([0]), high=np.array([33])),
                           Box(low=np.array([0]), high=np.array([3.5]))])

Multiple Action Spaces:

img

The environment was built successfully, however, when I tried training the PPO model on the specified environment I am facing the following error:

Error: AssertionError: The algorithm only supports (<class 'gym.spaces.box.Box'>, <class 'gym.spaces.discrete.Discrete'>, <class 'gym.spaces.multi_discrete.MultiDiscrete'>, <class 'gym.spaces.multi_binary.MultiBinary'>) as action spaces but Tuple(Box(22.0, 25.0, (1,), float32), Box(0.0, 230.0, (1,), float32), Box(0.0, 33.0, (1,), float32), Box(0.0, 3.5, (1,), float32)) was provided

Error: img

Can anyone suggest how to deal with this issue when working with multiple actions within a single action space and what exactly the error is signifying since what I understood was we needed to pass gym spaces within tuple however I had passed the Box space yet the error is thrown?

Mario
  • 1,631
  • 2
  • 21
  • 51
  • you should provide the action space in one Box only, where you specfify lower and upper value for each real valued quantities such as Box(np.array([22, 0, 0, 0]), np.array([25, 230, 33, 3.5]). See also: https://stackoverflow.com/questions/44404281/openai-gym-understanding-action-space-notation-spaces-box – Per Joachims Oct 15 '21 at 04:38

2 Answers2

0

if you are stable_baseline3 then the issue might come from the fact this it is not supporting Tuple, think of using Dict instead.

Dharman
  • 30,962
  • 25
  • 85
  • 135
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 17 '21 at 23:45
0

Maybe this can help you:

        self.action_space = spaces.Box(low=np.float32(np.tile([0, 0, 0, 0, 0, 0], (25, 1))),
                                        high=np.float32(np.tile([1, 1, 30, 3, 350, 8], (25, 1))),
                                        dtype=np.float32)

when you get a sample of this action, you will get an numpy array of (25, 6) dimension. something to like this (rounded to 1):

 [[7.000e-01 0.000e+00 2.700e+00 2.500e+00 1.865e+02 2.700e+00]
 [7.000e-01 4.000e-01 6.200e+00 2.600e+00 2.424e+02 5.100e+00]
 [5.000e-01 6.000e-01 2.680e+01 1.400e+00 5.270e+01 5.800e+00]
 [9.000e-01 6.000e-01 1.320e+01 1.900e+00 3.254e+02 7.400e+00]
 [1.000e-01 3.000e-01 2.800e+01 2.800e+00 1.197e+02 2.600e+00]
 [7.000e-01 2.000e-01 5.500e+00 1.500e+00 3.046e+02 6.000e+00]
 [6.000e-01 8.000e-01 5.000e-01 2.000e+00 2.810e+02 2.900e+00]
 [9.000e-01 9.000e-01 2.750e+01 2.200e+00 7.150e+01 1.300e+00]
 [2.000e-01 8.000e-01 2.410e+01 1.300e+00 1.843e+02 4.000e+00]
 [8.000e-01 6.000e-01 2.900e+01 2.000e+00 1.266e+02 7.100e+00]
 [8.000e-01 7.000e-01 3.900e+00 8.000e-01 3.105e+02 1.200e+00]
 [5.000e-01 1.000e+00 1.910e+01 2.300e+00 1.404e+02 2.700e+00]
 [3.000e-01 4.000e-01 7.100e+00 1.700e+00 2.591e+02 2.300e+00]
 [8.000e-01 9.000e-01 1.200e+01 2.600e+00 1.713e+02 7.000e+00]
 [0.000e+00 1.000e+00 1.660e+01 0.000e+00 1.912e+02 4.000e+00]
 [4.000e-01 0.000e+00 1.360e+01 2.600e+00 8.790e+01 2.000e-01]
 [6.000e-01 0.000e+00 2.750e+01 2.500e+00 2.577e+02 5.700e+00]
 [1.000e-01 9.000e-01 9.800e+00 1.000e+00 2.493e+02 1.100e+00]
 [8.000e-01 1.000e-01 1.000e+01 1.500e+00 2.122e+02 5.400e+00]
 [5.000e-01 3.000e-01 2.700e+01 6.000e-01 2.810e+01 9.000e-01]
 [4.000e-01 1.000e-01 2.330e+01 1.500e+00 1.339e+02 1.400e+00]
 [6.000e-01 9.000e-01 1.900e+00 2.400e+00 7.430e+01 6.900e+00]
 [9.000e-01 8.000e-01 2.910e+01 4.000e-01 1.926e+02 3.200e+00]
 [3.000e-01 1.000e-01 1.110e+01 1.400e+00 3.198e+02 1.600e+00]
 [6.000e-01 1.000e+00 2.580e+01 2.700e+00 6.220e+01 5.700e+00]]

Also, remember to use the correct algorithm for this kind of spaces, check stable-baselines3 for this: stable-baselines3 A2C

A2C spaces for action and obs