I have to get all documents sorted by id (>10000). To get all documents I have to use scan(). The problem with scan() is that I can't sort by id. Are there other solutions to get more than 10000 sorted documents?
Asked
Active
Viewed 687 times
1
-
Why you need to sort by _id ? Which version of elastic you run? Scan helper is thought to returns documents without sorting https://elasticsearch-py.readthedocs.io/en/master/helpers.html#scan – Lupanoide Jul 27 '20 at 18:47
-
However you could follow this example for search_after query, https://stackoverflow.com/questions/49320599/elastic-search-not-giving-data-with-big-number-for-page-size/49321145#49321145 but I would like to understand better your use case – Lupanoide Jul 27 '20 at 18:50
-
The ES version is 7.8 – McDizzy Jul 27 '20 at 20:02
-
I have two python function which return different parts of a document I want to build. These function yield the entities. Since in ES it isn't possible to join two indices I have to find a way to be sure that the two functions returen the entities in the same order to combine the documents step by step. Therefore I want to sort the ES result and yield both result by ascending document id. Since there are hundreds of thousands of documents it is very inefficient to load everything into memory. This is why I want to solve it via yield and divide the problem in batches. – McDizzy Jul 27 '20 at 20:08
-
However, I do not necessarily have to sort by document id, there is also a field called id that can be sorted by. – McDizzy Jul 28 '20 at 20:08