Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

146

votes

8 answers

What is the difference between Q-learning and SARSA?

Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book Reinforcement Learning: An Introduction (by Sutton and…

asked Jul 27 '11 at 17:46

Ælex

14,432
20
88
129

votes

6 answers

How can I apply reinforcement learning to continuous action spaces?

I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning). I'm hoping to use the Q-learning technique, but while I've…

algorithm machine-learning reinforcement-learning q-learning

asked Aug 17 '11 at 19:54

zergylord

4,368
5
38
60

votes

3 answers

What is the difference between Q-learning and Value Iteration?

How is Q-learning different from value iteration in reinforcement learning? I know Q-learning is model-free and training samples are transitions (s, a, s', r). But since we know the transitions and the reward for every transition in Q-learning, is…

machine-learning artificial-intelligence reinforcement-learning q-learning

asked Mar 09 '15 at 08:32

huskywolf

votes

2 answers

What is the difference between reinforcement learning and deep RL?

What is the difference between deep reinforcement learning and reinforcement learning? I basically know what reinforcement learning is about, but what does the concrete term deep stand for in this context?

machine-learning reinforcement-learning q-learning

asked Jun 22 '16 at 16:00

Christopher Klaus

votes

2 answers

Policy Gradients in Keras

I've been trying to build a model using 'Deep Q-Learning' where I have a large number of actions (2908). After some limited success with using standard DQN: (https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf), I decided to do some more research because…

python deep-learning theano keras q-learning

asked Nov 05 '16 at 12:56

simeon

votes

1 answer

Q-learning vs temporal-difference vs model-based reinforcement learning

I'm in a course called "Intelligent Machines" at the university. We were introduced with 3 methods of reinforced learning, and with those we were given the intuition of when to use them, and I quote: Q-Learning - Best when MDP can't be solved.…

machine-learning reinforcement-learning q-learning temporal-difference

asked Dec 09 '15 at 14:17

StationaryTraveller

1,449
2
19
31

votes

2 answers

DQN - Q-Loss not converging

I'm using the DQN algorithm to train an agent in my environment, that looks like this: Agent is controlling a car by picking discrete actions (left, right, up, down) The goal is to drive at a desired speed without crashing into other cars The state…

tensorflow deep-learning reinforcement-learning q-learning

asked Oct 31 '17 at 13:07

user8861893

votes

2 answers

Optimal epsilon (ϵ-greedy) value

ϵ-greedy policy I know the Q-learning algorithm should try to balance between exploration and exploitation. Since I'm a beginner in this field, I wanted to implement a simple version of exploration/exploitation behavior. Optimal epsilon value My…

machine-learning reinforcement-learning q-learning

asked Apr 02 '14 at 08:39

OccamsMan

votes

3 answers

Why doesn't my Deep Q Network master a simple Gridworld (Tensorflow)? (How to evaluate a Deep-Q-Net)

I try to familiarize myself with Q-learning and Deep Neural Networks, currently try to implement Playing Atari with Deep Reinforcement Learning. To test my implementation and play around with it, I tought I try a simple gridworld. Where i have a N…

python neural-network tensorflow deep-learning q-learning

asked Feb 14 '16 at 16:56

natschz

1,007
10
23

votes

3 answers

Epsilon and learning rate decay in epsilon greedy q learning

I understand that epsilon marks the trade-off between exploration and exploitation. At the beginning, you want epsilon to be high so that you take big leaps and learn things. As you learn about future rewards, epsilon should decay so that you can…

machine-learning reinforcement-learning q-learning

asked Nov 07 '18 at 22:00

maddie

1,854
4
30
66

votes

1 answer

How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?

EDIT(1/3/16): corresponding github issue I'm using Tensorflow (Python interface) to implement a q-learning agent with function approximation trained using stochastic gradient descent. At each iteration of the experiment, a step function in the agent…

python tensorflow machine-learning reinforcement-learning q-learning

asked Dec 30 '15 at 19:56

Jonathon Byrd

votes

9 answers

What's the difference between reinforcement learning, deep learning, and deep reinforcement learning?

What's the difference between reinforcement learning, deep learning, and deep reinforcement learning? Where does Q-learning fit in?

machine-learning neural-network deep-learning reinforcement-learning q-learning

asked May 26 '18 at 12:34

user9851027

votes

2 answers

Training only one output of a network in Keras

I have a network in Keras with many outputs, however, my training data only provides information for a single output at a time. At the moment my method for training has been to run a prediction on the input in question, change the value of the…

keras neural-network theano reinforcement-learning q-learning

asked Nov 06 '16 at 06:01

simeon

votes

3 answers

Q-learning vs dynamic programming

Is the classic Q-learning algorithm, using lookup table (instead of function approximation), equivalent to dynamic programming?

machine-learning dynamic-programming reinforcement-learning q-learning

asked Aug 17 '16 at 05:16

D_Wills

votes

3 answers

Are Q-learning and SARSA with greedy selection equivalent?

The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the actual next state. If a greedy selection policy is used, that is, the…

reinforcement-learning q-learning sarsa

asked Sep 29 '15 at 14:13

Mouscellaneous

2,584
3
27
37

2 3

…

29 30 Next