39

In this issue is a feature request for ordering with optional seed allowing for recreation of random order.

I need to be able to paginate random ordered results. How could this be be done with Elasticsearch 0.19.1 ?

Thanks.

Yeggeps
  • 2,055
  • 2
  • 25
  • 34

6 Answers6

76

This should be considerably faster than both answers above and supports seeding:

curl -XGET 'localhost:9200/_search' -d '{
  "query": {
    "function_score" : {
      "query" : { "match_all": {} },
      "random_score" : {}
    }
  }
}';

See: https://github.com/elasticsearch/elasticsearch/issues/1170

Nariman
  • 6,368
  • 1
  • 35
  • 50
  • 1
    Thanks for your help. I think the answer would make more clear that random_score supports seeding if you update the example query. – Jonas Anso Feb 21 '17 at 10:48
  • 1
    @MbRostami for filters you must add `"boost_mode": "replace",`, see https://stackoverflow.com/a/48338880/1194525 – bato3 Mar 21 '18 at 12:32
47

You can sort using a hash function of a unique field (for example id) and a random salt. Depending on how truly random the results should be, you can do something as primitive as:

{
  "query" : { "query_string" : {"query" : "*:*"} },
  "sort" : {
    "_script" : { 
        "script" : "(doc['_id'].value + salt).hashCode()",
        "type" : "number",
        "params" : {
            "salt" : "some_random_string"
        },
        "order" : "asc"
    }
  }
}

or something as sophisticated as

{
  "query" : { "query_string" : {"query" : "*:*"} },
  "sort" : {
    "_script" : { 
        "script" : "org.elasticsearch.common.Digest.md5Hex(doc['_id'].value + salt)",
        "type" : "string",
        "params" : {
            "salt" : "some_random_string"
        },
        "order" : "asc"
    }
  }
}

The second example will produce more random results but will be somewhat slower.

For this approach to work the field _id has to be stored. Otherwise, the query will fail with NullPointerException.

Muhammad Hassaan
  • 7,296
  • 6
  • 30
  • 50
imotov
  • 28,277
  • 3
  • 90
  • 82
  • Would I store the string on the client then? For example in a cookie? So that when the user calls for page 2 the same order is preserved? – Yeggeps Mar 21 '12 at 13:42
  • 1
    The salt string should be generated and stored on the layer that maintains user's session. It can be the same place where you store user's query or the page number that is currently displayed. It can be cookie as well. – imotov Mar 21 '12 at 13:58
  • Just a heads up, when implementing this solution with an index of 10M+ documents, it drastically increased CPU usage on the data nodes. I was expecting an increase, but not maxing out the servers. – Michael Love Sep 06 '22 at 13:47
25

Good solution from imotov.

Here is something much more simple and you don't need to rely in a document property:

{
  "query" : { "query_string" : {"query" : "*:*"} },
  "sort" : {
    "_script" : { 
        "script" : "Math.random()",
        "type" : "number",
        "params" : {},
        "order" : "asc"
    }
  }
}

if you want to set a range that would be something like:

{
  "query" : { "query_string" : {"query" : "*:*"} },
  "sort" : {
    "_script" : { 
        "script" : "Math.random() * (myMax - myMin) + myMin",
        "type" : "number",
        "params" : {},
        "order" : "asc"
    }
  }
}

replacing the max and min with your proper values.

DavidGOrtega
  • 287
  • 3
  • 3
  • 6
    This is a good general solution. However, the original question was asking for "optional seed allowing for recreation of random order". That's where all the complexity is coming from. – imotov Sep 26 '12 at 16:49
  • Yes, you are completely right. My solution is much more suitable for the title "Random order & pagination Elasticsearch". Completely insuficient for Yeggeps needs. – DavidGOrtega Sep 26 '12 at 18:30
  • Great answer but unfortunately this doesn't eliminate the overhead of script sort either... still adds > 1s to our query over 2M docs. – Nariman Dec 26 '13 at 20:59
  • I get ~10 duplicates for every 100 results (with size 20). How to eliminate dupliactes? – Vingtoft May 25 '20 at 12:09
3

I ended up solving it slightly different than what imotov suggested. As I have multiple clients I didn't want to implement the logic surrounding the salt string on every one of them.

I already had a randomized_key on the model. I also didn't need the order to be random for every request so I made a scheduled job to update the randomized key every night and then sorted by that field in Elasticssearch.

Yeggeps
  • 2,055
  • 2
  • 25
  • 34
2

New format:

{
    "sort": {
        "_script": {
            "type": "number",
            "script": {
                "source": "Math.random()",
                "lang": "painless"
            },
            "order": "asc"
        }
    }
}
Şafak Saylam
  • 21
  • 1
  • 1
0

Well, i was looking at doing this and all the approaches above seemed a little "too complicated" for something that should be relatively simple. So i came up with an alternative that works perfectly well without the need of "going mental"

I perform a _count query first then combine it with "Start" and rand(0,$count)

e.g.

JSONArray = array of json to send to ElasticSearch
$total_results = $ElasticSearchClient->count(JSONArray)
$start = rand(0, $total_results)
JSONArray['body']['from'] = $start;
$ElasticSearchClient->search(JSONArray);

Assumptions for the above example:

  • You're running PHP
  • You're also using the the PHP Client

But you dont NEED to do this with PHP, the approach would work with any example.

Andy
  • 679
  • 2
  • 10
  • 25