3

In elasticsearch the length of the document matters a lot to the final score of the search results. So if I have a match in a field that is just one line long, its going to score much higher than a single match in say .... a document with 5 pages of text. Is there a way to override this behavior, or reliably and repeatedly boost the result to overcome this behavior?

concept47
  • 30,257
  • 12
  • 52
  • 74
  • Re-reading the question after posting my answer...do you want to disable the fact that the length is taken into account or give more importance to longer matching fields? In fact it's about length of the matching field when computing the score, not about the size of the whole document. – javanna Jul 05 '13 at 21:04
  • Do both require different approaches? if so what is the difference in approach? – concept47 Jul 08 '13 at 22:10
  • 1
    The difference is that if you disable norms you don't take into account the length anymore, so a long field will be the same as a short field. If you want to give more importance to long fields that's not so straightforward, you can achieve it with a custom_score query and a script. – javanna Jul 09 '13 at 06:53

1 Answers1

3

I guess you mean that the length of the matching field is taken into account when computing the score. If you want to just disable this behaviour you can omit norms while indexing. That way you would lose index time boosting as well, but I guess you're not using it and even if you need boosting you should use query time boosting, way more flexible.

You have to update the mapping for your field like this:

"field_name" : {
    "type" : "string",
     "omit_norms" : true
}

If you want to override this default behaviour for all your string fields you can use a dynamic template like this:

{
    "type_name" : {
        "dynamic_templates" : [
            {
                "omit_norms_template" : {
                    "match_mapping_type" : "string",
                    "mapping" : {
                        "omit_norms" : true
                    }
                }
            }
        ]
    }
}
javanna
  • 59,145
  • 14
  • 144
  • 125
  • Actually I need index time boasting :\ is there a way to apply this at query time? What are the implications? – concept47 Jul 05 '13 at 21:22
  • Ouch... with omit_norms you are losing index time boosting, I would switch to query time boosting. There are different ways to do it with elasticsearch. How are you boosting your docs? Based on what? – javanna Jul 05 '13 at 21:26
  • okay, I'll try query time boosting ... I'm boosting my docs based on arbitrary business logic – concept47 Jul 06 '13 at 21:21
  • can you show me some examples of query time boosting with elasticsearch? I can't seem to find any good ones. – concept47 Jul 08 '13 at 22:09
  • Have a look at this other answer: http://stackoverflow.com/questions/12427449/elasticsearch-boosting-relevance-based-on-field-value/12430664#12430664 – javanna Jul 09 '13 at 07:04
  • This didn't work as intuitively as the index boosting for some reason. I seems that using the dis_max = true setting uses just one of the criteria to boost a doc, and this seems to produce different results from the index boost, even though I'm using the same exact weights. frustrating, I just want to disable the length_norm from being taken into consideration with large documents. – concept47 Jul 09 '13 at 20:31
  • Thanks for all the help @javanna. I'll accept your answer even though it didn't work for my case. – concept47 Jul 09 '13 at 23:30
  • That's weird, if you have a more specific problem that relates to your documents you can maybe ask another question. – javanna Jul 10 '13 at 08:20