I am trying to return a randomized result of a filter query to give all my documents a fair chance of being on the first page results. In an effort to not confuse users during repeated searches (and to easily support pagination) the results should stay consistent for the current day.
To do this I have developed the following script sort query. It combines the document id (a guid, so already fairly random) with a daily salt (just the day of year and current year combined) and hashes the result to produce what I would expect to be a fairly random string, that only changes as the 'daily salt' changes each day (ignore the extraneous elements in this specific query, it's generated from code).
{
"from": 0,
"size": 20,
"sort": {
"_script": {
"order": "asc",
"type": "string",
"script": "org.elasticsearch.common.Digest.md5Hex(dailySalt + doc['id'].value)",
"params": {
"dailySalt": "184-2013"
}
}
},
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"tag_id": "Some Tag"
}
},
{
"match_all": {}
}
]
}
}
}
},
"fields": [
"id"
]
}
Inspired from this similar question and answer
It works, but not very well. I get slightly different results as I increment the daily salt, but the same documents keep appearing around the top results. They move slightly, but there's definitely a consistent pattern.
I've tried to change the hash function to another I found:
org.elasticsearch.cluster.routing.operation.hash.djb.DjbHashFunction.DJB_HASH
but it gives very similar results of common top results.
I'm no cryptography expert so I presume this is a behavior of common hash functions and there must be some special hash functions to use for more randomized results based on similar inputs?
Is anyone familiar with one available in ElasticSearch? I'm using Searchbox.io (cloud hosted elastic search service) so installing my own custom function is not an option.
Or am I approaching this problem from a completely wrong angle?
Edit I just looked at the resultant sort keys produced by the script, and it appears that the script is only being applied to the first page of results, and then sorting that first page (rather than applying to the full result set and therefor changing the documents within the first page).
Here's my first page results (edited for brevity). But you can see on the first page alone that the sort key varies from 0c*** to fa***, for the first 0 - 20 docs, with a total of ~200 docs.
Using 'dailySalt' = 185-2013
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 196,
"max_score": null,
"hits": [{
"fields": {
"id": "27662ef8-d2a7-4fde-80f6-1571b83c4cde"
},
"sort": ["0cbf8b4e7927f0a53a5b82f2630ff9ad"]
}, {
"fields": {
"id": "d9b11797-053f-495e-a676-0ec959dba879"
},
"sort": ["0fa8730a5239f8a3d1286cbe16619bfa"]
}, {
"fields": {
"id": "482c893f-1083-4860-892e-1b25cf442199"
},
"sort": ["295edd71cc48ac41c5e2f91315abf5ce"]
}, {
"fields": {
"id": "581fd0f1-9ecb-4e5c-920b-06413bfbf4f7"
},
"sort": ["4b9f0d17bc2333d13a1963b4f6afb829"]
}, {
"fields": {
"id": "de3dddb8-e296-4446-ac4c-135cc925669d"
},
"sort": ["4c5d0bcb50f5b600e539ba46b33b1007"]
}, {
"fields": {
"id": "c83ad22e-80b4-40f1-8e56-2153a1a1f9e8"
},
"sort": ["55efe0a692ab3205405f1c74732b8205"]
}, {
"fields": {
"id": "7bd19829-4f37-4e02-9fd1-0239b8ae8db4"
},
"sort": ["5adcd22c7c507244d7ba382812accdf3"]
}, {
"fields": {
"id": "42fcec43-851f-4133-a8db-1d2bf0b86ec8"
},
"sort": ["6757f46bd554e3353a2ebf35c6b3d24c"]
}, {
"fields": {
"id": "e119132b-4e93-4047-8513-1ce2452f0cdd"
},
"sort": ["6dbcb59a2b5e91523896d57695251b29"]
}, {
"fields": {
"id": "7d0acf5d-7c14-45a2-97b7-17939ff512f4"
},
"sort": ["9d99752ec0802e55dcfb3c83bcd2e4bb"]
}, {
"fields": {
"id": "2cdc21e4-3312-460b-9a18-094e4f95a56c"
},
"sort": ["9dc43d1d39e64cfe04c6d7b8f565faaa"]
}, {
"fields": {
"id": "0f665cb3-5648-416c-b08f-146d2a019319"
},
"sort": ["b61bb718fe63a287b6fcdc8bcd638604"]
}, {
"fields": {
"id": "1e852d49-2b3b-4d7a-9f1b-1495b94e723e"
},
"sort": ["ba7ad8a3a6e195a6bc28e341f9d6965b"]
}, {
"fields": {
"id": "ca2a5922-bb42-4317-b61c-129925436a1f"
},
"sort": ["bca0411cf8d67b4dcd5b205a5010367f"]
}, {
"fields": {
"id": "b1dac760-7d73-4b60-bd6d-08ea9453e68c"
},
"sort": ["be3714cfb2517e98d525aaea6e40cfa5"]
}, {
"fields": {
"id": "c4b08def-59db-4ac0-b16f-0c3fae4c01f2"
},
"sort": ["c4220b31c305d536c7a7d1639da32c66"]
}, {
"fields": {
"id": "cc7ac1fd-3e88-4503-a837-2000ebb6e2d9"
},
"sort": ["ceb5710fe2418fe3b353bf7b1f737570"]
}, {
"fields": {
"id": "5a5f90c9-b44f-4ca2-9d16-117c8e9fd388"
},
"sort": ["dc5fea76598633cb08c1459983ebca62"]
}, {
"fields": {
"id": "6d811d5b-4138-4a41-a186-1b9aa2b65623"
},
"sort": ["ea3c55ac123ac9e819b145402407d1de"]
}, {
"fields": {
"id": "b489d2da-b4a1-44de-acde-219109edd42f"
},
"sort": ["fab53cc11983b45b081d4b01df555c59"]
}]
}
}