In Q-learning with function approximation, is it possible to avoid hand-crafting features?

Question

I have little background knowledge of Machine Learning, so please forgive me if my question seems silly.

Based on what I've read, the best model-free reinforcement learning algorithm to this date is Q-Learning, where each state,action pair in the agent's world is given a q-value, and at each state the action with the highest q-value is chosen. The q-value is then updated as follows:

Q(s,a) = (1-α)Q(s,a) + α(R(s,a,s') + (max_a' * Q(s',a'))) where α is the learning rate.

Apparently, for problems with high dimensionality, the number of states become astronomically large making q-value table storage infeasible.

So the practical implementation of Q-Learning requires using Q-value approximation via generalization of states aka features. For example if the agent was Pacman then the features would be:

Distance to closest dot
Distance to closest ghost
Is Pacman in a tunnel?

And then instead of q-values for every single state you would only need to only have q-values for every single feature.

So my question is:

Is it possible for a reinforcement learning agent to create or generate additional features?

Some research I've done:

This post mentions A Geramifard's iFDD method

which is a way of "discovering feature dependencies", but I'm not sure if that is feature generation, as the paper assumes that you start off with a set of binary features.

Another paper that I found was apropos is Playing Atari with Deep Reinforcement Learning, which "extracts high level features using a range of neural network architectures".

I've read over the paper but still need to flesh out/fully understand their algorithm. Is this what I'm looking for?

Thanks

score 4 · Accepted Answer · answered Dec 09 '14 at 13:13

It seems like you already answered your own question :)

Feature generation is not part of the Q-learning (and SARSA) algorithm. In a process which is called preprocessing you can however use a wide array of algorithms (of which you showed some) to generate/extract features from your data. Combining different machine learning algorithms results in hybrid architectures, which is a term you might look into when researching what works best for your problem.

Here is an example of using features with SARSA (which is very similar to Q-learning). Whether the papers you cited are helpful for your scenario, you'll have to decide for yourself. As always with machine learning, your approach is highly problem-dependent. If you're in robotics and it's hard to define discrete states manually, a neural network might be helpful. If you can think of heuristics by yourself (like in the pacman example) then you probably won't need it.

Thanks for your response. Do you know any articles that describe how one might generate heuristics using a neural network? Or perhaps popular/generally accepted methods of feature generation? I've done searches on convolutional networks, multilayer perceptrons, restricted Boltzmann machines and recurrent neural networks - but they seem to be very broad topics. I'm currently learning about the backpropagation/feed-forward algorithm used in neural networks, but am having trouble thinking about how to apply them for feature generation. — cozos, Dec 09 '14 at 13:49

In Q-learning with function approximation, is it possible to avoid hand-crafting features?

1 Answers1