51

I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning).

I'm hoping to use the Q-learning technique, but while I've found a way to extend this method to continuous state spaces, I can't seem to figure out how to accommodate a problem with a continuous action space.

I could just force all mouse movement to be of a certain magnitude and in only a certain number of different directions, but any reasonable way of making the actions discrete would yield a huge action space. Since standard Q-learning requires the agent to evaluate all possible actions, such an approximation doesn't solve the problem in any practical sense.

nbro
  • 15,395
  • 32
  • 113
  • 196
zergylord
  • 4,368
  • 5
  • 38
  • 60

6 Answers6

32

The common way of dealing with this problem is with actor-critic methods. These naturally extend to continuous action spaces. Basic Q-learning could diverge when working with approximations, however, if you still want to use it, you can try combining it with a self-organizing map, as done in "Applications of the self-organising map to reinforcement learning". The paper also contains some further references you might find useful.

peer
  • 4,171
  • 8
  • 42
  • 73
Don Reba
  • 13,814
  • 3
  • 48
  • 61
  • Oh wow, both of those sound spot-on. I'll test them out and accept your answer if they work as I expect they will. – zergylord Aug 18 '11 at 02:07
  • @Gulzar I can access both links. – Vivek Payasi Jan 27 '21 at 11:21
  • 1
    The SOM idea is neat, in essence is to discretize both the state-space and action-space into nodes of a SOM (one for each space). The discretization is not uniform, so the more important states would be represented more densely by the SOM. – Yan King Yin Aug 18 '22 at 12:23
27

Fast forward to this year, folks from DeepMind proposes a deep reinforcement learning actor-critic method for dealing with both continuous state and action space. It is based on a technique called deterministic policy gradient. See the paper Continuous control with deep reinforcement learning and some implementations.

zaxliu
  • 2,726
  • 1
  • 22
  • 26
  • 4
    Yeah, they've really popularized reinforcement learning -- now there are quite a few ways to handle continuous actions! The most relevant I believe is Q-learning with normalized advantage functions, since its the same q-learning algorithm at its heart. It just forces the action values to be a quadratic form, from which you can get the greedy action analytically. https://arxiv.org/pdf/1603.00748.pdf – zergylord Aug 05 '16 at 21:11
  • You'll also want to check out the Atari paper https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf – Shaun Mar 06 '17 at 19:39
  • For quick reference, the method in the paper that @zergylord has provided a link to is called NAF (normalized advantage function) – Chiffa Aug 23 '17 at 12:24
17

There are numerous ways to extend reinforcement learning to continuous actions. One way is to use actor-critic methods. Another way is to use policy gradient methods.

A rather extensive explanation of different methods can be found in the following paper, which is available online: Reinforcement Learning in Continuous State and Action Spaces (by Hado van Hasselt and Marco A. Wiering).

nbro
  • 15,395
  • 32
  • 113
  • 196
Peter
  • 179
  • 2
  • 6
    Actor–critic methods are a type of policy gradient methods. The actor, which is parameterized, implements the policy, and the parameters are shifted in the direction of the gradient of the actor's performance, which is estimated by the critic. – HelloGoodbye Mar 01 '17 at 17:58
7

For what you're doing I don't believe you need to work in continuous action spaces. Although the physical mouse moves in a continuous space, internally the cursor only moves in discrete steps (usually at pixel levels), so getting any precision above this threshold seems like it won't have any effect on your agent's performance. The state space is still quite large, but it is finite and discrete.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • This introduces the problem I mentioned with regards to discrete approximations (though I realize my domain is technically discrete to begin with), which is that it's unfeasible to think of every possible coordinate pair as a possible action. – zergylord Aug 18 '11 at 02:05
  • I agree with @templatetypedef. You can use discrete actions with a continuous state space. Discrete actions are much nicer to work with. – danelliottster Feb 18 '15 at 23:17
  • For my problem the problem is actually not about continuity but that the discrete action space is too large to be represented as a table, no as a finite-dim vector output by a neural network. So I need an alternative method to deal with the action space. – Yan King Yin Aug 18 '22 at 12:30
2

I know this post is somewhat old, but in 2016, a variant of Q-learning applied to continuous action spaces was proposed, as an alternative to actor-critic methods. It is called normalized advantage functions (NAF). Here's the paper: Continuous Deep Q-Learning with Model-based Acceleration

Santiago Benoit
  • 994
  • 1
  • 8
  • 22
2

Another paper to make the list, from the value-based school, is Input Convex Neural Networks. The idea is to require Q(s,a) to be convex in actions (not necessarily in states). Then, solving the argmax Q inference is reduced to finding the global optimum using the convexity, much faster than an exhaustive sweep and easier to implement than other value-based approaches. Yet, likely at the expense of a reduced representation power than usual feedforward or convolutional neural networks.

dhfromkorea
  • 198
  • 8