2

I try to get positions instead of highlighted text as the result of elasticsearch query.

Create the index:

PUT /test/
{
  "mappings": {
    "article": {
      "properties": {
        "text": {
          "type": "text",
          "analyzer": "english"
        },
        "author": {
          "type": "text"
        }
      }
    }
  }
}

Put a document:

PUT /test/article/1
{
  "author": "Just Me",
  "text": "This is just a simple test to demonstrate the audience the purpose of the question!"
}

Search the document:

GET /test/article/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "text": {
              "query": "simple test",
              "_name": "must"
            }
          }
        }
      ],
      "should": [
        {
          "match_phrase": {
            "text": {
              "query": "need help",
              "_name": "first",
              "slop": 2
            }
          }
        },
        {
          "match_phrase": {
            "text": {
              "query": "purpose question",
              "_name": "second",
              "slop": 3
            }
          }
        },
        {
          "match_phrase": {
            "text": {
              "query": "don't know anything",
              "_name": "third"
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "highlight": {
    "fields": {
      "text": {}
    }
  }
}

When i run this search, i get the result like so:

This is just a simple test to <em>demonstrate</em> the audience the purpose of the <em>question</em>!

I'm not interested in getting the results surrounded with em tags, but i want to get all the positions of the results like so:

"hits": [
   { "start_offset": 30, "end_offset": 40 },
   { "start_offset": 74, "end_offset": 81 }
]

Hope you get my idea!

Stefan
  • 53
  • 5

1 Answers1

1

To have the offset position of a word in a text you should add to your index mapping a termvector - doc here . As written in the doc, you have to enable this param at index time:

"term_vector": "with_positions_offsets_payloads"

For the specific query, please follow the linked doc page

Lupanoide
  • 3,132
  • 20
  • 36
  • Thanks for your answer, but this doesn't solve my problem. Maybe my question wasn't precise enough. Or maybe i don't understand term_vectors correctly. When i get this article right, elasticsearch gives me back a "listing" with all words in that document, including the occuring positions. To get these positions, i would need to know the word exactly. But i need to search for phrases and i do search for synonyms as well. So i can't get the matches in the list the way the example provides. I've edited my question to show a more realistic query and wonder, how to match that with term_vectors. – Stefan Apr 11 '18 at 09:42
  • @Stefan look at this answer. https://stackoverflow.com/questions/63460335/return-position-and-highlighting-of-search-queries-in-elasticsearch/63464116#63464116 Does it solves your question? – Lupanoide Sep 08 '20 at 08:30