4

My problem is, that search is a small addition to my Application and I don't really want to invest that much time into digging into the whole idea. Looking at my search results - its a very common pattern that I get some very good matches (7+) and some very very bad matches, witch score like 0.10. If I would like to sort the results using any other criteria than score, it will make very little sense, as the 0.10 have almost nothing to do with the query and might end up first on the list.

Seriously, it looks like cutting everything below score of around 3 will make my results way more consistent and sorting will make much more sense.

Now, after doing some basic research, it looks like lots of people think, that filtering Solr results by score is really bad idea. There are some hits on how to do this, but I couldn't find a working solution yet.

The suggested ideas with using frange (on both the proper q query or qf) doesn't really work. Ditching the low score results in the App itself seems pretty dull as well, since it will break pagination, slow things down and in general yield in a lot of unnecessary work.

After roughly na hour on the Google I found out that a lot of people really want this solution, though I couldn't find anything which works for me.

So, is there any way at all to ditch low score results on the solr side? Are there any custom Filters to do that?

Edit:

Vast of the results have a significant score gap at the bottom for some reason. For example the last relevant result get say 4.5 score and there is a always few more results with next highest one at 0.12... Maybe I am doing something wrong on the index level? Is there any simple way to push those irrelevant results down off the result hash? After some more research looks like that I would be more less ok after just ditching the < 1 scores...

mdrozdziel
  • 5,528
  • 6
  • 39
  • 55
  • 2
    For one thing, the score had meaning only in a comparative sense, not in an absolute sense. A "good" result may even have a score of .2 for certain searches. So, you need to determine the threshold empirically. And setting a cutoff may actually block working results and (IMO) is psychologically equivalent to a girlfriend who refuses to talk to you. So if you do set a threshold, show the results (and listed pages) below the threshold grayed out ( [see an example](http://stackoverflow.com/questions/209170/how-much-does-it-cost-to-develop-an-iphone-application) ) – Jesvin Jose Feb 10 '12 at 07:34
  • I am aware that the decision on the threshold is tricky... How do you solve the problem of sorting in that case in general? Doing first by score and later by price makes no sence, as score is float. Even mapping results to score ranges is silly, because for a user its looks like the sorting is broken. I am perfetly fine to kill some vaild results. For my case this is far better than showing completly unrelevant items on the top of the list, just becuase they start with an A. Any one with an idea how to solve that in solr? – mdrozdziel Feb 10 '12 at 09:24
  • 2
    Boost functions and queries will boost the **search score itself** based on numerical values of fields and occurences of terms. You can set the price to influence the score. (BTW I never used those features, so I dont speak from experience) – Jesvin Jose Feb 10 '12 at 09:49
  • 1
    I'd suggest you to work on the way you boost documents. This [link](http://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_negative_.28or_very_low.29_boost_to_documents_that_match_a_query.3F) might be useful. – javanna Feb 10 '12 at 10:09
  • @javanna Thanks for the link, but I have tens of thousands of long documents and the search is not the main focus by all means. Micromanaging every single document/query behind the scenes is pretty much impossible. I've checked several different cases, and the results are pretty much the same every time. Perfect matches on the top and lousy irrelevant stuph at the end. I would gladly just ditch those for now and I would be more than happy. After cross checking tens of queries it seems like putting a bar on 3 maybe 4 will solve the case. Seriously no way to do it on search level? – mdrozdziel Feb 10 '12 at 11:14
  • How do you influence the score? At query time or index time? – javanna Feb 10 '12 at 12:01
  • So far only at the index time, because I don't need to offer any fancy input interface for the user. Typical query is composed just of 1-3 words, nothing fancy. – mdrozdziel Feb 10 '12 at 14:06
  • 1
    I assume you are using dismax? Did you try tweaking the mm to get less but more relevant results? – Okke Klein Feb 11 '12 at 00:43
  • To have good answers I think you should explain a little bit deeper, maybe also an example could help, and of course you configuration/schema. – javanna Feb 13 '12 at 22:12

1 Answers1

0

Bailing out at the application level seems to be what most folks do.

One idea is to pick a percentage that you like, then look at the first doc and use it as the denominator, and then each subsequent doc as the numerator, and then stop below your ratio. But I agree doing it at this level does mess up paging, etc.

Another idea is to write a custom Solr plugin that forces the score to zero below some point - that would fix the pagination and facets, etc. The place to start would be the default "Similarity" scoring code (the name is a bit odd, I had passed by it a few times myself)

Mark Bennett
  • 1,446
  • 2
  • 19
  • 37