2

i have just started to study Q-learning and see the possibilities of using Q-learning to solve my problem.

Problem: I am supposed to detect a certain combination of data, i have four matrices that acts as an input to my system, i have already categorised the inputs ( each input can either be Low (L) , or High (H) ). I need to detect certain types of input for example LLLH, LLHH, HHHH etc

NOTE: 1)LLLH means the first input in L, second input is L, third input is L and the fourth input is H! 2)I have labelled each type of input type as state, for example LLLL is state 1, LLLH is state 2, so on.

What i have studied in Q-learning is that most of the time you have one goal (only one state as a goal) which makes it easier for the agent to learn and create the Q-matrix from the R-matrix . Now in my problem i have many goal ( many states act as goal and need to be detected). I dont know how to design the states, how to create the Reward-matrix by having many goals and how the agent will learn. Can you please help me how can i use Q-learning in this kind of situation. Taking into account i have like 16 goals in 20+ states!

as i have mentioned above, i know what is q-learning, how the states and the goal works, the calculation of Q_matrix (how it learns).... but the problem is now i have many goals, i dont really know how to relate my problem to q-learning.. how many states do i need, and how to label the Rewards as i have many goals.

I need help on at least how can i create reward matrix with many goals

  • 1
    Multiple goals should not be a blocker for a q-learning setup as long as you have correct utility when taking a given action in a given state. However, compared to having only one goal, you might need more iterations to learn an optimal policy. what is not clear in your statement is: (1) what are the possible actions at a given state? (2) what is a state? if LLLH is a state, you have at most 2*4 = 16 states??? – greeness Nov 14 '13 at 23:06
  • Correct, i have 16 states, in which 15 of them are goals, n depending on the input received will determine the next move of the agent, for example we are currently in LLLH state, the next input received was HLHL.. thn the agent must move from state LLLH ( goal state) to HLHL ( another goal state). Thats why am confused, because i dont know which is the optimal move in each state as i i have many goals and the move i will make will depend on the input received! – user2994193 Nov 15 '13 at 05:55
  • As a valid action, can you go from any state to any other state? Or you are limited to flip only 1 or 2 bits in your LLLH? I am asking this because if you can going from any state to any other, then it does not makes sense to me. The optimal policy for Q(S,A) will be proportional to the Utility(A), so we don't need to do any Q-learning. – greeness Nov 15 '13 at 08:35
  • Yap from one state, you can go to any other state ( depending on the input value received!) ...thats why am confused on how i can implement q-learning in this situation! – user2994193 Nov 15 '13 at 10:54
  • In this case, is your utility only dependent on state, or the utility (reward) also dependent on what action leads you to the state? If Utility is dependent on both state and action, then it might make sense. Otherwise, why not just jump to the state that has the maximum reward no matter which state you are in now. – greeness Nov 15 '13 at 11:04

2 Answers2

0

I need help on at least how can i create reward matrix with many goals

The simplest way is to make a reward for each goal and then make a weighted sum out of those rewards to make a total reward.

Rtot = w1 * R1 + w2 * R2 + ... + wn * Rn

you can decide then how to weigh each reward and it affects the final behavior of the agent because each time it tries to learn something different.

There are more complicated way that is called "Multi-dimensional Reward RL" or "Multi-criteria RL". You can google them and find related papers.

NKN
  • 6,482
  • 6
  • 36
  • 55
0

Multiple goals are being investigated as it does solve some critical RL problems.

Here is a great article where the goal is to deliver packages or recharge the battery... If you don't recharge the deliveries will fail, but if you constantly charge, you will not make any deliveries. It is a balance between these two important goals.

The author talk you through the logic and approach in TensorFlow: https://www.oreilly.com/ideas/reinforcement-learning-for-complex-goals-using-tensorflow

mazecreator
  • 543
  • 1
  • 11
  • 27