3

I need some help with solving a problem that uses the Q-learning algorithm.

Problem description:

I have a rocket simulator where the rocket is taking random paths and also crashes sometimes. The rocket has 3 different engines that can be either on or off. Depending on which engine(s) is activated, the rocket flies towards different directions.

Functions for turning the engines off/on is available

enter image description here

The task:

Construct the Q-learning controller that will turn to rocket to face up all the time.

A sensor that reads the angle of the rocket is available as input.

My solution:

I have the following states:

enter image description here

I also have the following actions:

  • all engines off
  • left engine on
  • right engine on
  • middle engine on
  • left and right on
  • left and middle on
  • right and middle on

And the following rewards:

Angle = 0, Reward = 100 All other angles, reward = 0

Question:

Now to the question, is this a good choice of rewards and states ? Can I improve my solution ? Is it better to have more rewards for other angles ?

Thanks in advance

Kara
  • 6,115
  • 16
  • 50
  • 57
mrjasmin
  • 1,230
  • 6
  • 21
  • 37
  • What's the goal of this game? Landing the rocket- like in lunar landing? Or does it need to just fly arround and not crash? – Thomas Jungblut Jun 11 '13 at 17:12
  • Hi! The goal is to make it face north and fly upwards. It's always flying but I need to make it fly upwards. It must not crash. When it reaches north and cant go further, than it starts over. – mrjasmin Jun 11 '13 at 17:14
  • exactly. With my current solution it flies upwards but it's not optimized. – mrjasmin Jun 11 '13 at 17:29
  • Ups I deleted my last comment, so the goal is to stay balanced and not hit walls while under constant gravity. Thanks for clarification. – Thomas Jungblut Jun 11 '13 at 17:31

2 Answers2

4

16 states x 7 actions is a very small problem.

Rewards for other angles will help you learn faster, but can create odd behaviors later depending on your dynamics.

If you don't have momentum you may decrease the number of states, which will speed up learning and reduce memory useage (which is already tiny). To find the optimal number of states, try decreasing the number of states while analyzing a metric such as reward/timesteps over multiple games, or mean error (normalized by starting angle) over multiple games. Some state representations may perform much better than others. If not, choose the one which converges fastest. This should be relatively cheap with your small Q table.

If you want to learn quickly, you may also try Q-lambda or some other modified Reinforcement Learning algorithm to make use of temporal difference learning.

Edit: Depending on your dynamics this problem may not actually be suitable as a Markov Decision Process. For example, you may need to include the current rotation rate.

Josh S.
  • 128
  • 8
2

Try putting smaller rewards on the states next to the desired state. This will get your agent to learn to go up quicker.