68

Need to find a way in ElasticSearch to boost the relevance of a document based on a particular value of a field. Specifically, there is a special field in all my documents where the higher the field value is, the more relevant the doc that contains it should be, regardless of the search.

Consider the following document structure:

{
    "_all" : {"enabled" : "true"},
    "properties" : {
        "_id":            {"type" : "string",  "store" : "yes", "index" : "not_analyzed"},
        "first_name":     {"type" : "string",  "store" : "yes", "index" : "yes"},
        "last_name":      {"type" : "string",  "store" : "yes", "index" : "yes"},
        "boosting_field": {"type" : "integer", "store" : "yes", "index" : "yes"}
        }
}

I'd like documents with a higher boosting_field value to be inherently more relevant than those with a lower boosting_field value. This is just a starting point -- the matching between the query and the other fields will also be taken into account in determining the final relevance score of each doc in the search. But, all else being equal, the higher the boosting field, the more relevant the document.

Anyone have an idea on how to do this?

Thanks a lot!

Chris Dutrow
  • 48,402
  • 65
  • 188
  • 258
Clay Wardell
  • 14,846
  • 13
  • 44
  • 65
  • See also https://stackoverflow.com/a/41813578/5444623 for different boosting by field of document types – PeterM Nov 13 '17 at 22:04

4 Answers4

74

You can either boost at index time or query time. I usually prefer query time boosting even though it makes queries a little bit slower, otherwise I'd need to reindex every time I want to change my boosting factors, which usally need fine-tuning and need to be pretty flexible.

There are different ways to apply query time boosting using the elasticsearch query DSL:

The first three queries are useful if you want to give a specific boost to the documents which match specific queries or filters. For example, if you want to boost only the documents published during the last month. You could use this approach with your boosting_field but you'd need to manually define some boosting_field intervals and give them a different boost, which isn't that great.

The best solution would be to use a Custom Score Query, which allows you to make a query and customize its score using a script. It's quite powerful, with the script you can directly modify the score itself. First of all I'd scale the boosting_field values to a value from 0 to 1 for example, so that your final score doesn't become a big number. In order to do that you need to predict what are more or less the minimum and the maximum values that the field can contain. Let's say minimum 0 and maximum 100000 for instance. If you scale the boosting_field value to a number between 0 and 1, then you can add the result to the actual score like this:

{
    "query" : {
        "custom_score" : {
            "query" : {
                "match_all" : {}
            },
            "script" : "_score + (1 * doc.boosting_field.doubleValue / 100000)"
        }
    }
}

You can also consider to use the boosting_field as a boost factor (_score * rather than _score +), but then you'd need to scale it to an interval with minimum value 1 (just add a +1).

You can even tune the result in order the change its importance adding a weight to the value that you use to influence the score. You are going to need this even more if you need to combine multiple boosting factors together in order to give them a different weight.

javanna
  • 59,145
  • 14
  • 144
  • 125
  • can you accomodate and filter in the custom_score. Right now your query is only match_all, can you add some and filter in it. – user12345 Jul 05 '13 at 06:09
  • You can use a [filtered query](http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/) and add both a query and a filter to it for instance. – javanna Jul 05 '13 at 07:36
  • But it will not help the purpose. What you write in the example is fine for me but need to add one filter in main query., – user12345 Jul 05 '13 at 07:44
  • "filter": { "and": [ { "query": { "match": { "xxxx": { "query": "barfoo" } } } } ] } – user12345 Jul 05 '13 at 07:46
  • YOu can add it as a top level filter then. If you have a specific question and you want to get a good answer it would be better to ask your own question. – javanna Jul 05 '13 at 07:50
  • http://stackoverflow.com/questions/17467135/score-while-doing-indexing-in-elasticsearch – user12345 Jul 05 '13 at 08:02
  • Using doc.boosting_field.doubleValue produced errors for me. I instead used doc.boosting_field.getValue(). – Ted Avery Jan 08 '14 at 00:18
  • 6
    True, there's now one single query to rule them all: the [function_score](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html) query. – javanna May 01 '14 at 08:49
  • There are [some notes](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_relation_to_literal_custom_boost_literal_literal_custom_score_literal_and_literal_custom_filters_score_literal) at the bottom of the Function Score Query page on how `function_score` relates to `custom_boost_factor`, `custom_score`, and `custom_filters_score`. – anon Jul 30 '14 at 16:35
13

With a recent version of Elasticsearch (version 1.3+) you'll want to use "function score queries":

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

A scored query_string search looks like this:

{
 'query': {
        'function_score': {
            'query': { 'query_string': { 'query': 'my search terms' } },
            'functions': [{ 'field_value_factor': { 'field': 'my_boost' } }]
        }
    }
}

"my_boost" is a numeric field in your search index that contains the boost factor for individual documents. May look like this:

{ "my_boost": { "type": "float", "index": "not_analyzed" } }
Simon Steinberger
  • 6,605
  • 5
  • 55
  • 97
3

if you want to avoid to do the boosting each time inside the query, you might consider to add it to your mapping directly adding "boost: factor.

So your mapping then may look like this:

{
    "_all" : {"enabled" : "true"},
    "properties" : {
        "_id":            {"type" : "string",  "store" : "yes", "index" : "not_analyzed"},
        "first_name":     {"type" : "string",  "store" : "yes", "index" : "yes"},
        "last_name":      {"type" : "string",  "store" : "yes", "index" : "yes"},
        "boosting_field": {"type" : "integer", "store" : "yes", "index" : "yes", "boost" : 10.0,}
        }
}
HolgT
  • 663
  • 5
  • 18
  • 12
    Adding it to the query is not just about duplication, that query time boosting which you can change every time, while if you add the boost to your mapping that's index time boosting, you need to reindex it to change it. I'd always recommend query time boosting over index time boosting. – javanna May 01 '14 at 08:50
0

If you are using Nest, you should use this syntax:

.Query(q => q
    .Bool(b => b
        .Should(s => s
            .FunctionScore(fs => fs
                .Functions(fn => fn
                    .FieldValueFactor(fvf => fvf
                        .Field(f => f.Significance)
                        .Weight(2)
                        .Missing(1)
        ))))
        .Must(m => m
            .Match(ma => ma
                .Field(f => f.MySearchData)
                    .Query(query)
))))