Use feedback or reinforcement in machine learning?

Question

I am trying to solve some classification problem. It seems many classical approaches follow a similar paradigm. That is, train a model with some training set and than use it to predict the class labels for new instances.

I am wondering if it is possible to introduce some feedback mechanism into the paradigm. In control theory, introducing a feedback loop is an effective way to improve system performance.

Currently a straight forward approach on my mind is, first we start with a initial set of instances and train a model with them. Then each time the model makes a wrong prediction, we add the wrong instance into the training set. This is different from blindly enlarge the training set because it is more targeting. This can be seen as some kind of negative feedback in the language of control theory.

Is there any research going on with the feedback approach? Could anyone shed some light?

Look up boosting, this is basically what you are describing. — Sean Owen, Apr 06 '14 at 00:15
smwikipedia: I am after exactly the same problem. http://stackoverflow.com/questions/36068292/incorporating-user-feedback-in-a-ml-model. Would you like to share your findings ? — Anuj Gupta, Jul 23 '16 at 03:06
@AnujGupta My question was inspired the *negative feedback theory* in control theory. This question has been a while but I didn't dig into it much due to project shift. I suggest you reading the replies below. Especially the one I granted bounty. Sorry for not being able to help you much. — smwikipedia, Jul 24 '16 at 02:25

score 10 · Answer 1 · answered Apr 10 '14 at 09:50

There are two areas of research that spring to mind.

The first is Reinforcement Learning. This is an online learning paradigm that allows you to get feedback and update your policy (in this instance, your classifier) as you observe the results.

The second is active learning, where the classifier gets to select examples from a pool of unclassified examples to get labelled. The key is to have the classifier choose the examples for labelling which best improve its accuracy by choosing difficult examples under the current classifier hypothesis.

score 1 · Answer 2 · answered Apr 05 '14 at 21:41

1

I have used such feedback for every machine-learning project I worked on. It allows to train on less data (thus training is faster) than by selecting data randomly. The model accuracy is also improved faster than by using randomly selected training data. I'm working on image processing (computer vision) data so one other type of selection I'm doing is to add clustered false (wrong) data instead of adding every single false data. This is because I assume I will always have some fails, so my definition for positive data is when it is clustered in the same area of the image.

answered Apr 05 '14 at 21:41

rold2007

1,297
1
12
25

I do not think this will work for every machine learning method. The fact that training it on failed datapoints makes it better is not obvious (may be, it is starting to fail on all the points which are outside the set). Your experience is just one data point, please support it by some academic research references. Also mention which exact methods you used, since the behaviour could be strikingly different. Otherwise I find it doubtful as my experience tells me this works less often than it doesn't, but I'm just another data point... – sashkello Apr 09 '14 at 02:08
1

If every answer on SO had to be supported by academic research references there would only be a handful of accepted answers. I applied this technique to OCR, image similarity and pedestrian detection. I used GentleBoost. When working on images, the number of negative samples is almost infinite while the number of positive samples is quite limited, so using randomly selected data out of an infinity of possibilities is inefficient. The training time will be longer and the accuracy will be lower. – rold2007 Apr 09 '14 at 04:08
Not every answer on SO should be backed by a reference. This should. Otherwise it is merely an opinion of one person. So, you used this technique for three projects, mention this in the answer, which exact problems it helped with. "every machine-learning project I worked on" is very broad. If you worked on three, that is very little data for such a claim. I worked on >20, but I simply don't know what the answer is because it is always different for me. Yes, I might be doing something wrong which is even better reason for you to specify what exactly you did so that the answer becomes useful. – sashkello Apr 09 '14 at 04:38
Also, "Is there any research going on with the feedback approach?" is in the question, so OP expects some references to literature on this topic. Mention how this method is called, give some links for further reading, etc. At the moment this answer is equivalent to "yes, it helped me a few times" which is rather a comment. – sashkello Apr 09 '14 at 04:39

score 1 · Answer 3 · answered Apr 10 '14 at 18:14

1

I saw this paper some time ago, which seems to be what you are looking for.

They are basically modeling classification problems as Markov decision processes and solving using the ACLA algorithm. The paper is much more detailed than what I could write here, but ultimately they are getting results that outperform the multilayer perceptron, so this looks like a pretty efficient method.

answered Apr 10 '14 at 18:14

Charles Menguy

40,830
17
95
117

it looks like the link to the paper is broken. Could you please provide a different working link, or a citation to the paper in text as a replacement? And the ACLA link that searches DeepDyve also does not have any useful results. Please do look into that as well. Thank you. – user1953384 May 22 '18 at 11:12
The links are broken and the citation is always better or immutable – Shawn Cicoria Jun 19 '19 at 11:27

Use feedback or reinforcement in machine learning?

3 Answers3

Linked