3

I am facing the problem of sort Solr results based on user click log. I would like that more accessed results comes first. Does anyone knows how to configure or implement such property in Solr?

Thank you very much.

Yavar
  • 11,883
  • 5
  • 32
  • 63
Kp Gupta
  • 483
  • 2
  • 7
  • 23
  • What do you mean by more accessed results ? Do you maintain the view count with your index which would help you identify this ? – Jayendra Mar 29 '12 at 06:43
  • dn't knw how to maintain the view count in solr? means how to sendrequest back to solr when user click on the particular data??? – Kp Gupta Mar 29 '12 at 06:51
  • A similar question i asked http://stackoverflow.com/questions/8411860/can-solr-boost-results-on-number-of-social-likes – Jesvin Jose Mar 29 '12 at 13:09

1 Answers1

6

Good Question. Your problem can be considered as a classic Collective Intelligence or Wisdom of Crowd problem. First step is to have the count of url's clicked for certain query i.e. for each query, url pair you will have a count maintained for this tuple. Each time a user clicks on a particular url the count gets incremented by 1. As a second step when Solr would return you the results based on its ranking and relevance Algorithms (say LCS, Vector Space etc.) on top of that for each query,url pair returned you to have frame a formulae that adds a certain value (based on number of clicks) to the rank given by Solr for a document and then you have to display the results based on total rank obtained.

Total Rank Obtained for a Document = Rank given by Solr + Click Ranking Numeric value given by you.

For an example when you search for "iphone plan", Solr returns you the following links in order of high rank to low:

  1. Apple
  2. AT&T
  3. Amazon

Now you check for each query, url pair i.e. {"iphone plan", Apple} {"iphone plan", AT&T} {"iphone plan", Amazon} the number of clicks and you get to know that number of clicks for the query is highest for AT&T as compared to Apple. By using your user defined formulae and giving some weightage to clicks you rerank the above and change their display order.

However note that the formulae you devise should not be good for the spammers who can change the entire ranking stuff of your website by having enormous clicks for a particular document (Say by using a robot:))

The above is the logic. Now there are two ways to go about implementing the above:

  1. Change the Lucene Similarity Class (http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/all/org/apache/lucene/search/Similarity.html) i.e. first understand how Lucene does the ranking and then embed your module into that

  2. Implement it as a standalone routine on top of Solr.

Note: Remember that getting the counts for query,url pairs is not easy if you have huge/big data, in that case you would require to write some map reduce jobs in order to accomplish this.

Yavar
  • 11,883
  • 5
  • 32
  • 63
  • we are not using lucene..so how to use directly to get the results using php scripts?? – Kp Gupta Mar 29 '12 at 10:15
  • @KpGupta: Lucene is the engine behind Solr that does ranking/relevance stuff for you. As I mentioned it wont be provided out of box to you you will have to write code/Algorithm for doing Click based ranking stuff. – Yavar Mar 29 '12 at 10:21
  • we are ready for writing a code but we d't know how to send back the call to solr??? – Kp Gupta Mar 29 '12 at 10:58