I'm not sure how to get the Q Values for a DDQN.
DQN is the normal network, TAR the target network.
q_values = self.DQN.predict(c_states) # DQN batch predict Q on states
dqn_next = self.DQN.predict(n_states) # DQN batch predict Q on next_states
tar_next = self.TAR.predict(n_states) # TAR batch predict Q on next_states
I mainly found 2 versions:
Version 1:
q_values[i][actions[i]] = (rewards[i] + (GAMMA * np.amax(tar_next[i])))
Version 2:
act = np.argmax(dqn_next[i])
q_values[i][actions[i]] = (rewards[i] + (GAMMA * tar_next[i][act]))
Which one is correct? And why?
Version 1 Links:
https://github.com/keon/deep-q-learning/blob/master/ddqn.py
https://pythonprogramming.net/training-deep-q-learning-dqn-reinforcement-learning-python-tutorial
Version 2 Links:
https://github.com/germain-hug/Deep-RL-Keras/blob/master/DDQN/ddqn.py
https://jaromiru.com/2016/11/07/lets-make-a-dqn-double-learning-and-prioritized-experience-replay/
EDIT: Many thanks, to clarify this
Q-learning:
q_values[i][actions[i]] = (rewards[i] + (GAMMA * np.amax(tar_next[i])))
SARSA:
act = np.argmax(dqn_next[i])
q_values[i][actions[i]] = (rewards[i] + (GAMMA * tar_next[i][act]))
EDIT: re-open 03/2020
I'm sorry but i have to re-open that question. Maybe I misunderstood something, but the following sources show that my Version 2 (SARSA) is Double Q Learning?
Page 158 : Double Q-learning http://incompleteideas.net/book/RLbook2018.pdf