Reinforcement Learning in arbitrarily large action/state spaces

Question

I’m interested to use Deep Reinforcement Learning in order to find an - unique - optimal path back home among (too many) possibilities and a few (required) intermediate stopes (for instance, buy a coffee or refuel).

Furthermore, I want to apply this in cases where the agent doesn’t know a “model” of the environment, and the agent can't try all possible combinations of states and actions at all. I.e. needing to use approximation techniques in Q-value function (and/or policy).

I’ve read of methods for facing cases like this - where rewards, if any, are sparse and binary - like Monte Carlo Tree search (which implies some sort of modeling and planning, according to my understandings) or Hindsight Experience Replay (HER), applying ideas of DDPG.

But there are so many different kind of algorithms to consider, I’m a bit confused what’s best to begin with. I know it’s a difficult problem, and maybe it’s too naive to ask this, but Is there any clear, direct and we’ll-known way to solve the problem I want to face?

Thanks a lot!

Matias

It is a very generic question and the answer depends on too many things. I think this is a good place to start https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html It is a quick and clear recap of the most famous RL algorithms, from the old policy gradient to the latest ones, with link to papers and implementations. — Simon, Mar 15 '19 at 17:22

score 0 · Answer 1 · answered May 29 '19 at 22:05

If the final destination is fixed as in this case(home) you can go for dynamic search as a* will not work due to changeable enviornment. And if you want to use deep learning algorithm then go for a3c with experience replay due to the large action/state spaces.It capable of handling complex probelm.

Reinforcement Learning in arbitrarily large action/state spaces

1 Answers1