21

I have been trying to use Elasticsearch for our application, but the pagination having a limit of 10k is actually an issue for us, and scroll API is also not a recommended choice due to having to time out issue.

I found out Elasticsearch has something called search_after, which is the ideal solution for supporting deep pagination. I have been trying to understand it from docs but its bit confusing and was not able to clearly understand how it works.

Let's assume, I have three columns in my document, id, first_name, last_name, here ID is a unique primary key.

{
    "size": 10,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "sort": [
        {"id": "asc"}      
    ]
}

Can I use the above query for using the search_after functionality? I read in their docs that, we have to use multiple unique value in sort rather than just one (ID), but as you know in my dataset I only have ID as unique. What can I do to use search_after for my dataset example?

I was not able to understand the issue stated, if I use one unique tie-breaker for sort? Can someone help to explain this in laymen terms?

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-search-after.html

A field with one unique value per document should be used as the tiebreaker of the sort specification. Otherwise the sort order for documents that have the same sort values would be undefined and could lead to missing or duplicate results. The _id field has a unique value per document but it is not recommended to use it as a tiebreaker directly. Beware that search_after looks for the first document which fully or partially matches tiebreaker’s provided value. Therefore if a document has a tiebreaker value of "654323" and you search_after for "654" it would still match that document and return results found after it. doc value are disabled on this field so sorting on it requires to load a lot of data in memory. Instead it is advised to duplicate (client side or with a set ingest processor) the content of the _id field in another field that has doc value enabled and to use this new field as the tiebreaker for the sort.

user_12
  • 1,778
  • 7
  • 31
  • 72
  • As per my understanding, you can use sort on only one field provided that field value is unique. When you want to sort the documents based on some field which is not unique, then you need to add multiple sort fields (one with unique value as a secondary sort) as a tie breaker. – Pramod Jun 25 '21 at 10:33
  • @Pramod From the docs, they mentioned using only _ID field is not ideal because, search_after does partial matches rather than fully match I guess. This seems what they explain in their docs page. I was wondering how I can solve this issue? – user_12 Jun 25 '21 at 10:49
  • I thought `Id` field you mentioned is different from `_id` field. Yes, it is not advised to use `_id` in sorting as it requires to load a lot of data in memory. You can copy the `_id` field as `id` field of the document and use that for sorting. – Pramod Jun 25 '21 at 11:13
  • @Pramod Sorry, the ID field is different from _id field. It was a typo. What about the problems that they are discussing about, `Therefore if a document has a tiebreaker value of "654323" and you search_after for "654" it would still match that document and return results found after it.` – user_12 Jul 02 '21 at 01:09

2 Answers2

33

In your case, if your id field contains unique values and has the type keyword (or numeric) then you're absolutely fine and can use it to paginate using search_after.

So the first call would be the one you have in your question:

{
    "size": 10,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "sort": [
        {"id": "asc"},
        {"score": "desc"}      
    ]
}

In your reponse, you need to look at the last hit and take the sort value from that last hit:

{
    "_index" : "myindex",
    "_type" : "_doc",
    "_id" : "100000012",
    "_score" : null,
    "_source": { ... },
    "sort" : [
      "100000012",                                <--- take this
      "98"                                        <--- take this
    ]
}

Then in your next search call, you'll specify that value in search_after

{
    "size": 10,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "search_after": [ "100000012", "98" ],        <--- add this
    "sort": [
        {"id": "asc"}      
    ]
}

And the first hit of the next result set will be id: 100000013. That's it. There's nothing more to it.

The problem you're pointing at does not concern you if you always sort with full id values. The way it works is that you always use the last id value from the previous results. If you were to add "search_after": ["1000"] then you'd have the issue they mention, but there's no reason for you to do it.

Val
  • 207,596
  • 13
  • 358
  • 360
  • Will search_after work if I have one more column (field) called `score`, each document will have a score of 0 -100 and the problem is there could be documents with same score. I want resulted documents with scores in descending order. In this case, do I need to use both ID and score together in sort. If I do that, which will get more priority. Will my results sorted based on id or score. Will i be able to use search after? – user_12 Jul 02 '21 at 08:35
  • 1
    If you search with two sort fields (id first and score second), then the `sort` array in the results will have two values (`["100000012", "98"]`) and you'll need to use both values in the `search_after` for the next query. But since `id` has unique value, you don't run the risk of missing any data. I've updated my answer accordingly – Val Jul 02 '21 at 08:54
  • I hope, I can interchange ```[{"score": "desc"} ,{"id": "asc"} ]```, because if I use `id` first then results are sorted based on ID, if I use `score` first then results are sorted according to scores. I want results based on scores in desc. – user_12 Jul 02 '21 at 09:57
  • 1
    It's ok to swap the sort fields as long as there's the `id` one that acts as tiebreaker – Val Jul 02 '21 at 10:03
  • All good. Closed the question. Thanks again for all those clarifications. – user_12 Jul 03 '21 at 07:51
  • On a side note, I'm facing another issue not at all related to this question, but if you do have time, please do help out. https://stackoverflow.com/questions/68224968/java-lang-illegalargumentexception-setting-index-lifecycle-rollover-alias-for – user_12 Jul 03 '21 at 07:53
  • The document further adds that, in case you don't have any column as `unique` key for all of your records, or at least if you are not sure which one to consider `unique`, then you can use sorting `{"_shard_doc": "desc"}` (as it is, in `sort` field). If using `pit` (point in time), it will be implicitly provided, (but still you can explicitly provide it). read docs for what it is (_shard_doc), but its not actually necessary. – Hari Kishore Oct 28 '21 at 18:49
  • Does it work with function_score instead of sort in the request? – insanely_sin Dec 05 '22 at 20:25
  • @insanely_sin please create a new thread referencing this one and explaining your use case – Val Dec 05 '22 at 20:56
0

I've added a simple test to make it more understandable. You can have a look.

POST search_after/_bulk
{"index":{}}
{"id":1,"field_name":"field_value test 1"}
{"index":{}}
{"id":2,"field_name":"field_value test 2"}
{"index":{}}
{"id":3,"field_name":"field_value test 3"}
{"index":{}}
{"id":4,"field_name":"field_value test 4"}
{"index":{}}
{"id":5,"field_name":"field_value test 5"}
{"index":{}}
{"id":6,"field_name":"field_value test 6"}
{"index":{}}
{"id":7,"field_name":"field_value test 7"}

#first query

GET search_after/_search
{
  "size": 3, 
  "query": {
    "match": {
      "field_name": "field_value"
    }
  },
  "search_after": ["0"],
  "sort": [
    {
      "id": {
        "order": "asc"
      }
    }
  ]
}

#second query

GET search_after/_search
{
  "size": 3, 
  "query": {
    "match": {
      "field_name": "field_value"
    }
  },
  "search_after": ["3"],
  "sort": [
    {
      "id": {
        "order": "asc"
      }
    }
  ]
}
Musab Dogan
  • 1,811
  • 1
  • 6
  • 8