How do you update the weights in function approximation with reinforcement learning?

Question

My SARSA with gradient-descent keep escalating the weights exponentially. At Episode 4 step 17 the value is already nan

Exception: Qa is nan

e.g:

6) Qa:
Qa = -2.00890180632e+303

7) NEXT Qa:
Next Qa with west = -2.28577776413e+303

8) THETA:
1.78032402991e+303 <= -0.1 + (0.1 * -2.28577776413e+303) - -2.00890180632e+303

9) WEIGHTS (sample)
5.18266630725e+302 <= -1.58305782482e+301 + (0.3 * 1.78032402991e+303 * 1)

I don't know where to look for the mistake I made. Here's some code FWIW:

def getTheta(self, reward, Qa, QaNext):
    """ let t = r + yQw(s',a') - Qw(s,a) """
    theta = reward + (self.gamma * QaNext) - Qa


def updateWeights(self, Fsa, theta):
    """ wi <- wi + alpha * theta * Fi(s,a) """
    for i, w in enumerate(self.weights):
        self.weights[i] += (self.alpha * theta * Fsa[i])

I have about 183 binary features.

An answer is hardly possible given the provided info. I would try reducing alpha/theta, and look in detail on the involved quantities. — davidhigh, May 21 '14 at 15:23
Are you doing the normalization step, or just adding to the weights? — NKN, Aug 13 '14 at 09:54
@NKN thanks, your normalization step helps. Still new to this, I wish there was more documentation on that. — Tjorriemorrie, Aug 14 '14 at 05:43

NKN · Accepted Answer · 2014-05-30T13:20:45.947

2

you need normalization in each trial. This will keep the weights in a bounded range. (e.g. [0,1]). They way you are adding the weights each time, just grows the weights and it would be useless after the first trial.

I would do something like this:

self.weights[i] += (self.alpha * theta * Fsa[i])
normalize(self.weights[i],wmin,wmax)

or see the following example (from literature of RL):

enter image description here

You need to write the normalization function by yourself though ;)

edited May 30 '14 at 13:20

answered May 30 '14 at 12:17

NKN

6,482
6
36
55

Could you please perhaps give the source of the literature? I would like to read more about it. – Tjorriemorrie Oct 21 '14 at 10:42
I would suggest this book: http://books.google.it/books?hl=en&lr=&id=UGUqcl8_T9QC&oi=fnd&pg=PP1&dq=reinforcement+learning+linear+function+approximation+lucian&ots=Xk47TPU8Ww&sig=88QfOYsStxB4gT1BByZqd5h97sQ&redir_esc=y#v=onepage&q=reinforcement%20learning%20linear%20function%20approximation%20lucian&f=false – NKN Oct 21 '14 at 11:37

score 0 · Answer 2 · answered Dec 17 '18 at 06:03

I do not have access to the full code in your application, so I might be wrong. But I think that I know where you are going wrong. First and foremost, normalization should not be necessary here. For weights to get bloated so soon in this situation suggests something wrong with your implementation.

I think your update equation should be:-

self.weights[:, action_i] = self.weights[:, action_i] + (self.alpha * theta * Fsa[i])

That is to say that you should be updating columns instead of rows, because rows are for states and columns for for actions in the weight matrix.

How do you update the weights in function approximation with reinforcement learning?

2 Answers2