Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
146
votes
8 answers

What is the difference between Q-learning and SARSA?

Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book Reinforcement Learning: An Introduction (by Sutton and…
Ælex
  • 14,432
  • 20
  • 88
  • 129
51
votes
6 answers

How can I apply reinforcement learning to continuous action spaces?

I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning). I'm hoping to use the Q-learning technique, but while I've…
zergylord
  • 4,368
  • 5
  • 38
  • 60
41
votes
3 answers

What is the difference between Q-learning and Value Iteration?

How is Q-learning different from value iteration in reinforcement learning? I know Q-learning is model-free and training samples are transitions (s, a, s', r). But since we know the transitions and the reward for every transition in Q-learning, is…
35
votes
2 answers

What is the difference between reinforcement learning and deep RL?

What is the difference between deep reinforcement learning and reinforcement learning? I basically know what reinforcement learning is about, but what does the concrete term deep stand for in this context?
23
votes
2 answers

Policy Gradients in Keras

I've been trying to build a model using 'Deep Q-Learning' where I have a large number of actions (2908). After some limited success with using standard DQN: (https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf), I decided to do some more research because…
simeon
  • 585
  • 4
  • 15
23
votes
1 answer

Q-learning vs temporal-difference vs model-based reinforcement learning

I'm in a course called "Intelligent Machines" at the university. We were introduced with 3 methods of reinforced learning, and with those we were given the intuition of when to use them, and I quote: Q-Learning - Best when MDP can't be solved.…
22
votes
2 answers

DQN - Q-Loss not converging

I'm using the DQN algorithm to train an agent in my environment, that looks like this: Agent is controlling a car by picking discrete actions (left, right, up, down) The goal is to drive at a desired speed without crashing into other cars The state…
21
votes
2 answers

Optimal epsilon (ϵ-greedy) value

ϵ-greedy policy I know the Q-learning algorithm should try to balance between exploration and exploitation. Since I'm a beginner in this field, I wanted to implement a simple version of exploration/exploitation behavior. Optimal epsilon value My…
OccamsMan
  • 235
  • 1
  • 2
  • 7
20
votes
3 answers

Why doesn't my Deep Q Network master a simple Gridworld (Tensorflow)? (How to evaluate a Deep-Q-Net)

I try to familiarize myself with Q-learning and Deep Neural Networks, currently try to implement Playing Atari with Deep Reinforcement Learning. To test my implementation and play around with it, I tought I try a simple gridworld. Where i have a N…
natschz
  • 1,007
  • 10
  • 23
18
votes
3 answers

Epsilon and learning rate decay in epsilon greedy q learning

I understand that epsilon marks the trade-off between exploration and exploitation. At the beginning, you want epsilon to be high so that you take big leaps and learn things. As you learn about future rewards, epsilon should decay so that you can…
maddie
  • 1,854
  • 4
  • 30
  • 66
13
votes
1 answer

How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?

EDIT(1/3/16): corresponding github issue I'm using Tensorflow (Python interface) to implement a q-learning agent with function approximation trained using stochastic gradient descent. At each iteration of the experiment, a step function in the agent…
12
votes
9 answers

What's the difference between reinforcement learning, deep learning, and deep reinforcement learning?

What's the difference between reinforcement learning, deep learning, and deep reinforcement learning? Where does Q-learning fit in?
12
votes
2 answers

Training only one output of a network in Keras

I have a network in Keras with many outputs, however, my training data only provides information for a single output at a time. At the moment my method for training has been to run a prediction on the input in question, change the value of the…
11
votes
3 answers

Q-learning vs dynamic programming

Is the classic Q-learning algorithm, using lookup table (instead of function approximation), equivalent to dynamic programming?
11
votes
3 answers

Are Q-learning and SARSA with greedy selection equivalent?

The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the actual next state. If a greedy selection policy is used, that is, the…
Mouscellaneous
  • 2,584
  • 3
  • 27
  • 37
1
2 3
29 30