0

I am having problems with Elastic Search. It seams the search term is being isolated in search results.

We have a large subtitle database that was indexed using Elastic Search. It seams however, that our searches prioritize search results where the search term is isolated.

Ie: the search for "Eat" produces:

Oh, skydiving. // Skydiving. // Oh, I got that one. // Eating crazy. // Eating, eating. // Just pass, just pass. // You guys suck at that. // What was that? // Synchronized swimming

AND

it's my last night so we're gonna live // life like there's no tomorrow. // - I think I'd just wanna, // - Eat. // - Bring all the food, // whether it's Mcdonald's, whether it's, // - Ice cream.

We need to INSTEAD prioritize search results where the searchTerm is found WITHIN the sentence, rather than just on its own.

I need help determining what needs to be fixed - The Mapping, the filters, the tokenizers etc.

Here are my settings:

static public function getSettings(){
    return [
        'number_of_shards' => 1,
        'number_of_replicas' => 1,
        'analysis' => [
            'filter' => [
                'filter_stemmer' => [
                    'type' => 'stemmer',
                    'language' => 'english'
                ]
            ],
            'analyzer' => [
                'text_analyzer' => [
                    'type' => 'custom',
                    "stopwords" => [],
                    'filter' => ['lowercase', 'filter_stemmer','stemmer'],
                    'tokenizer' => 'standard'
                ],
            ]
        ]
    ];
}

and here are my mapping:

https://gist.github.com/firecentaur/d0e1e196f7fddbb4d02935bec5592009

And here is my search

https://gist.github.com/firecentaur/5ac97bbd8eb02c406d6eecf867afc13c

What am I doing wrong?

Paul Preibisch
  • 4,115
  • 2
  • 27
  • 32
  • sorry but its your question is not very clear, can you provide your query, sample docs, current docs and expected docs in JSON format, so that I can help you. – Amit Sep 09 '20 at 02:46

1 Answers1

0

This behavior must be caused by the TL/IDF algorithm. If a query match a field, it will be more important if their is few words in the field. If you want to adapt this to your use case, you can use a function_score query. This post should help you to find a solution.

How can I boost the field length norm in elasticsearch function score?

Jaycreation
  • 2,029
  • 1
  • 15
  • 30