Implementations of Hierarchical Reinforcement Learning

Question

Can anyone recommend a reinforcement learning library or framework that can handle large state spaces by abstracting them?

I'm attempting to implement the intelligence for a small agent in a game world. The agent is represented by a small two-wheeled robot that can move forward and backwards, and turn left and right. It has a couple sensors for detecting a boundary on the ground, a couple ultrasonic sensors for detecting objects far away, and a couple bump sensors for detecting contact with an object or opponent. It also can do some simple dead reckoning to estimate its position in the world using its starting position as a reference. So all the state features available to it are:

edge_detected=0|1
edge_left=0|1
edge_right=0|1
edge_both=0|1
sonar_detected=0|1
sonar_left=0|1
sonar_left_dist=near|far|very_far
sonar_right=0|1
sonar_right_dist=near|far|very_far
sonar_both=0|1
contact_detected=0|1
contact_left=0|1
contact_right=0|1
contact_both=0|1
estimated_distance_from_edge_in_front=near|far|very_far
estimated_distance_from_edge_in_back=near|far|very_far
estimated_distance_from_edge_to_left=near|far|very_far
estimated_distance_from_edge_to_right=near|far|very_far

The goal is to identify the state where the reward signal is received, and learn a policy to acquire that reward as quickly as possible. In a traditional Markov model, this state space represented discretely would have 2985984 possible values, which is far too much to explore each and every one using something like Q-learning or SARSA.

Can anyone recommend a reinforcement library appropriate for this domain (preferably with Python bindings) or an unimplemented algorithm that I could potentially implement myself?

Don Reba · Answer 1 · 2014-10-28T08:59:09.933

3

Your actual state is the robot's position and orientation in the world. Using these sensor readings is an approximation, since it is likely to render many states indistinguishable.

Now, if you go down this road, you could use linear function approximation. Then this is just 24 binary features (12 0|1 + 6*2 near|far|very_far). This is such a small number that you could even use all pairs of features for learning. Farther down this road is online discovery of feature dependencies (see Alborz Geramifard's paper, for example). This is directly related to your interest in hierarchical learning.

An alternative is to use a conventional algorithm to track the robot's position and use the position as input to RL.

edited Oct 28 '14 at 08:59

answered Oct 28 '14 at 05:08

Don Reba

13,814
3
48
61

Could you recommend an introduction to linear function approximation in the context of RL? I'm not familiar with it. And yes, the state approximation is intentional, since its meant to simulate a robot in the "real world", which will never truly know its absolute Euclidean position and orientation. – Cerin Oct 28 '14 at 14:06
I would recommend the classic Sutton and Barto. http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html – Don Reba Oct 28 '14 at 15:38
I stumbled on this answer "you could even use all pairs of features". That sounds like (2^24)^2 = 2^48 ~ 10^14. That is not small.. – WestCoastProjects Oct 29 '14 at 15:18
@javadba, with linear function approximation, it's just 24 features, not 2²⁴. 24²=576 is still peanuts. – Don Reba Oct 29 '14 at 19:05

Implementations of Hierarchical Reinforcement Learning

1 Answers1

Linked