Elasticsearch multi-match scoring based on the number of matching highlights

Question

I am doing a multi-match search using the following query object using script_score:

{
    _source: [
        'baseline',
        'cpcr',
        'date',
        'description',
        'dev_status',
        'element',
        'event',
        'id'
    ],
    track_total_hits: true,
    query: {
       script_score: {
           query: {
               bool: {
                   filter: []
               },
           },
           script: {
               source: "def v=doc['description'].value; def score = 10000; score += v.length(); score -= " + "\"" + searchObject.query + "\"" + ".indexOf(v)*50;", // throws error
               params: { highlights: 3 }
           }
       }
    },
    highlight: { fields: { '*': {} } },
    sort: [],
    from: 0,
    size: 50
}

I'd like the results to be ordered by their number of highlight matches. For instance the first record would have 5 < em >'s, second record would have 4 < em > matches and so on. Currently my results aren't sorted this way.

elasticsearch.config.ts

"settings": {
        "analysis": {
            "analyzer": {
                "search_synonyms": {
                    "tokenizer": "whitespace",
                    "filter": [
                        "graph_synonyms",
                        "lowercase",
                        "asciifolding"
                    ],
                }
            }
        }
    },

    "mappings": {
        "properties": {
            "description": {
                "type": "text",
                "analyzer": "search_synonyms"
            },
            "narrative": {
                "type":"object",
                "properties":{
                    "_all":{
                        "type": "text",
                        "analyzer": "search_synonyms"
                    }
                }
            },
        }
    }

Sample data

Joe - GMapsBook.com · Answer 1 · 2020-05-19T09:50:32.870

0

I don't think that's possible.

When you think about it, since you're using multi_match, the docs w/ the most field matches would probably score highest which increases the chances that they would also have the most <em>s. It'd still be possible to post-process the hits and sort by the num. of occurrences.

The reason it's not possible is because the highligting mechanism works outside of the sort API and one cannot reach the other. One can always 'hack' it with some fancy script but there's no straightforward way to do it.

Addendum

Check out this related answer to access multiple fields within a script: https://stackoverflow.com/a/61620705/8160318

edited May 19 '20 at 09:50

answered May 18 '20 at 11:51

Joe - GMapsBook.com

15,787
4
23
68

Is it possible to adjust the document score based on the number of highlights/matching occurrences? – shAkur May 18 '20 at 19:12
I tried using a script_score to provide a custom score for each document but I don't know how to access multiple fields within `"source"` object (since I use a `multi_match` and search within multiple fields). Please check my updated question. – shAkur May 19 '20 at 05:17
You'll need to set `fielddata:true` on your text fields: https://stackoverflow.com/a/38156296/8160318 Or use the `keyword` datatype -- but those are case-sensitive so you'll need to adjust your script to take care of that. – Joe - GMapsBook.com May 19 '20 at 07:50
Thanks, the `fielddata:true` option seems to work. However I'm still not sure how do I search within multiple fields for matching occurrences? In the question above I'm only using description field `doc['description'].value` – shAkur May 19 '20 at 09:09
Seems like I can access field data using `doc[key].value` inside `for loop` but throws error when using `doc[key + 'keyword'].value`. Also it seems `fielddata:true` doesn't work for the `narrative` object from the above mapping – shAkur May 19 '20 at 12:09
use `doc[key + '.keyword'].value` (dot before `keyword`). Also, all your `.keyword` fields must be defined as such in your mapping. Consider dropping the index & reindexing. – Joe - GMapsBook.com May 19 '20 at 12:18

Elasticsearch multi-match scoring based on the number of matching highlights

1 Answers1