I’m interested to use Deep Reinforcement Learning in order to find an - unique - optimal path back home among (too many) possibilities and a few (required) intermediate stopes (for instance, buy a coffee or refuel).
Furthermore, I want to apply this in cases where the agent doesn’t know a “model” of the environment, and the agent can't try all possible combinations of states and actions at all. I.e. needing to use approximation techniques in Q-value function (and/or policy).
I’ve read of methods for facing cases like this - where rewards, if any, are sparse and binary - like Monte Carlo Tree search (which implies some sort of modeling and planning, according to my understandings) or Hindsight Experience Replay (HER), applying ideas of DDPG.
But there are so many different kind of algorithms to consider, I’m a bit confused what’s best to begin with. I know it’s a difficult problem, and maybe it’s too naive to ask this, but Is there any clear, direct and we’ll-known way to solve the problem I want to face?
Thanks a lot!
Matias