7

I am looking to research an appropriate algorithm for my purpose, can someone suggest a good learning algorithm for the following scenario:

A user can search for some word in a set of sentences. I will then return the top 10 sentences based on that keyword, I want the algorithm to allow user input, that is a user can click on the best sentences and this information will help the search algorithm to return more appropriate results in the future.

rwong
  • 6,062
  • 1
  • 23
  • 51
Neutralise
  • 507
  • 8
  • 17
  • 1
    Any more information about how the searches relate to the sentences would help. – BCoates Apr 09 '11 at 05:33
  • 1
    what features are you extracting or computing from the text, the queries, and the user's clicks? – Ron Apr 09 '11 at 06:01
  • A good starting point: [Apache Lucene](http://lucene.apache.org/java/docs/index.html) – rwong Apr 09 '11 at 07:04
  • 1
    @rwong, I already use lucene and I use inverse document frequency and query expansion to rank the sentences based on a given keyword. This works fine. However once they are ranked based on this method, I want the user to be able to select sentences which are most correct for them. That is allow the user to train the search and the search will then learn from those relevant sentences and use that knowledge in the future. – Neutralise Apr 09 '11 at 08:29

3 Answers3

2

Seems like you want to use user feedback to improve some kind of search results. If that is right you should take a look at Rocchio. You could, also, maintain a list of "clicked" sententes for each keyword. That way you can boost the "clicked" ones positions in the rank.

Felipe Hummel
  • 4,674
  • 5
  • 32
  • 35
1

You might find this chapter in the Qi II manual useful:

6.3 Property Lists

The chapter discusses the use of semantic nets to store and sort data. I also recommend the exercises at the back of the chapter; they may give you some ideas, no matter what language you're writing in.

CSP
  • 31
  • 1
0

Look into some sort of matrix factorization technique, like singular value decomposition or non-negative matrix factorization.

BCoates
  • 1,508
  • 10
  • 16