What is the difference between deep reinforcement learning and reinforcement learning? I basically know what reinforcement learning is about, but what does the concrete term deep stand for in this context?
2 Answers
Reinforcement Learning
In reinforcement learning, an agent tries to come up with the best action given a state.
For example, in the video game Pac-Man, the state space would be the 2D game world you are in, the surrounding items (pac-dots, enemies, walls, etc), and actions would be moving through that 2D space (going up/down/left/right).
So, given the state of the game world, the agent needs to pick the best action to maximise rewards. Through reinforcement learning's trial and error, it accumulates "knowledge" through these (state, action)
pairs, as in, it can tell if there would be positive or negative reward given a (state, action)
pair. Let's call this value Q(state, action)
.
A rudimentary way to store this knowledge would be a table like below
state | action | Q(state, action)
---------------------------------
... | ... | ...
The (state, action)
space can be very big
However, when the game gets complicated, the knowledge space can become huge and it no longer becomes feasible to store all (state, action)
pairs. If you think about it in raw terms, even a slightly different state is still a distinct state (e.g. different position of the enemy coming through the same corridor). You could use something that can generalize the knowledge instead of storing and looking up every little distinct state.
So, what you can do is create a neural network, that e.g. predicts the reward for an input (state, action)
(or pick the best action given a state, however you like to look at it)
Approximating the Q
value with a Neural Network
So, what you effectively have is a NN that predicts the Q
value, based on the input (state, action)
. This is way more tractable than storing every possible value like we did in the table above.
Q = neural_network.predict(state, action)
Deep Reinforcement Learning
Deep Neural Networks
To be able to do that for complicated games, the NN may need to be "deep", meaning a few hidden layers may not suffice to capture all the intricate details of that knowledge, hence the use of deep NNs (lots of hidden layers).
The extra hidden layers allows the network to internally come up with features that can help it learn and generalize complex problems that may have been impossible on a shallow network.
Closing words
In short, the deep neural network allows reinforcement learning to be applied to larger problems. You can use any function approximator instead of an NN to approximate Q
, and if you do choose NNs, it doesn't absolutely have to be a deep one. It's just researchers have had great success using them recently.
-
Thank you very much for your comprehensive answer. So as I understand the **deep** refers to the approximation of Q through a neural network and the associated possibility of using reinforcement learning in a larger scale. – Christopher Klaus Jun 23 '16 at 18:55
-
2"Deep" would be from [Deep Learning](https://en.wikipedia.org/wiki/Deep_learning) (emphasis on the multiple processing layers). To generalize, we could argue that the Deep RL label could be applied to any RL scheme that has a deep learning component to it. E.g. [this paper](https://www.aaai.org/ocs/index.php/WS/AAAIW11/paper/viewFile/3898/4303) uses [_Deep Belief Networks_](https://en.wikipedia.org/wiki/Deep_belief_network) as the approximator. Before [you ask](http://stats.stackexchange.com/questions/51273/what-is-the-difference-between-a-neural-network-and-a-deep-belief-network) :) – bakkal Jun 23 '16 at 19:16
-
1Other papers: [Deep Auto-Encoder Neural Networks in Reinforcement Learning](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.172.1873&rep=rep1&type=pdf), and perhaps the most known one [Playing Atari with Deep Reinforcement Learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf), uses a deep (and convolutional) Neural Network. – bakkal Jun 23 '16 at 19:31
Summary: Deep RL uses a Deep Neural Network to approximate Q(s,a). Non-Deep RL defines Q(s,a) using a tabular function.
Popular Reinforcement Learning algorithms use functions Q(s,a) or V(s) to estimate the Return (sum of discounted rewards). The function can be defined by a tabular mapping of discrete inputs and outputs. However, this is limiting for continuous states or an infinite/large number of states. A more generalized approach is necessary for large number of states.
Function approximation is used for a large state space. A popular function approximation method is Neural Networks. You can make a Deep Neural Network by adding many hidden layers.
Thus, Deep Reinforcement Learning uses Function Approximation, as opposed to tabular functions. Specifically DRL uses Deep Neural Networks to approximate Q or V (or even A).

- 1,684
- 3
- 15
- 25
-
oh, not just the DNN for policy, also the DNN to be used in place of Q-function? – Dee Jan 22 '21 at 11:02
-
1You can do function approximation to the policy function or to the value function. Or both. Policy approximation, in this case, is achieved by Deep Neural Networks. – Luis B Feb 03 '21 at 19:33