I need some help with solving a problem that uses the Q-learning algorithm.
Problem description:
I have a rocket simulator where the rocket is taking random paths and also crashes sometimes. The rocket has 3 different engines that can be either on or off. Depending on which engine(s) is activated, the rocket flies towards different directions.
Functions for turning the engines off/on is available
The task:
Construct the Q-learning controller that will turn to rocket to face up all the time.
A sensor that reads the angle of the rocket is available as input.
My solution:
I have the following states:
I also have the following actions:
- all engines off
- left engine on
- right engine on
- middle engine on
- left and right on
- left and middle on
- right and middle on
And the following rewards:
Angle = 0, Reward = 100 All other angles, reward = 0
Question:
Now to the question, is this a good choice of rewards and states ? Can I improve my solution ? Is it better to have more rewards for other angles ?
Thanks in advance