2

I am currently working on optimizing reward values for the Q-Learning I'm doing. So right now I consider two values that calculate a specific reward value. Since this is work related i can't specify the variable names i take into consideration. the reward takes the form: reward = a + b where a takes values from a list: [10, 20, 40, 60, 80] and b can be any value ranging from 0 to infinity ie b ε [0,∞). Even though the value of b will not be so large, it can take any value within the range.

So the situation is such that: if the b is something like b=1300 and a=80, the reward = 1380 where the priority of value a gets eclipsed by b. Is there someway I can formulate reward such that both the values of a and b have equal priority like both having 50% value while calculating reward?

1 Answers1

1

One technique that I would recommend which should solve your problem is to regularize the Q-values for both a and b. There are lots of ways to do just that, but I think L1 or L2 regularization should solve your problem nicely.

In short, L2 regularization is a mathematical equation that calculates the sum of the square of the weights.

l2 regularization formula

The image above is from chioka.in.

  • But could you help me citing an example.Something like, initially my value for a particular data point is 0. Then for the same data point the reward value is found to have `a=80` and `b=1300` (just an example). Then how would I regularize the Q-value at this instance? Just that I did not get a clarity here. – pythonic_autometeor Jan 31 '18 at 10:12