I am facing the problem of sort Solr results based on user click log. I would like that more accessed results comes first. Does anyone knows how to configure or implement such property in Solr?
Thank you very much.
I am facing the problem of sort Solr results based on user click log. I would like that more accessed results comes first. Does anyone knows how to configure or implement such property in Solr?
Thank you very much.
Good Question. Your problem can be considered as a classic Collective Intelligence or Wisdom of Crowd problem. First step is to have the count of url's clicked for certain query i.e. for each query, url pair you will have a count maintained for this tuple. Each time a user clicks on a particular url the count gets incremented by 1. As a second step when Solr would return you the results based on its ranking and relevance Algorithms (say LCS, Vector Space etc.) on top of that for each query,url pair returned you to have frame a formulae that adds a certain value (based on number of clicks) to the rank given by Solr for a document and then you have to display the results based on total rank obtained.
Total Rank Obtained for a Document = Rank given by Solr + Click Ranking Numeric value given by you.
For an example when you search for "iphone plan", Solr returns you the following links in order of high rank to low:
Now you check for each query, url pair i.e. {"iphone plan", Apple} {"iphone plan", AT&T} {"iphone plan", Amazon} the number of clicks and you get to know that number of clicks for the query is highest for AT&T as compared to Apple. By using your user defined formulae and giving some weightage to clicks you rerank the above and change their display order.
However note that the formulae you devise should not be good for the spammers who can change the entire ranking stuff of your website by having enormous clicks for a particular document (Say by using a robot:))
The above is the logic. Now there are two ways to go about implementing the above:
Change the Lucene Similarity Class (http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/all/org/apache/lucene/search/Similarity.html) i.e. first understand how Lucene does the ranking and then embed your module into that
Implement it as a standalone routine on top of Solr.
Note: Remember that getting the counts for query,url pairs is not easy if you have huge/big data, in that case you would require to write some map reduce jobs in order to accomplish this.