13

I am using ElasticSearch via NEST c#. I have large list of information about people

{
   firstName: 'Frank',
   lastName: 'Jones',
   City: 'New York'
}

I'd like to be able to filter and sort this list of items by lastName as well as order by the length so people who only have 5 characters in their name will be at the beginning of the result set then people with 10 characters.

So with some pseudo code I'd like to do something like list.wildcard("j*").sort(m => lastName.length)

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
mvcNewbie
  • 520
  • 1
  • 11
  • 23

1 Answers1

13

You can do the sorting with script-based sorting.

As a toy example, I set up a trivial index with a few documents:

PUT /test_index

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name":"Bob"}
{"index":{"_id":2}}
{"name":"Jeff"}
{"index":{"_id":3}}
{"name":"Darlene"}
{"index":{"_id":4}}
{"name":"Jose"}

Then I can order search results like this:

POST /test_index/_search
{
   "query": {
      "match_all": {}
   },
   "sort": {
      "_script": {
         "script": "doc['name'].value.length()",
         "type": "number",
         "order": "asc"
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 4,
      "max_score": null,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": null,
            "_source": {
               "name": "Bob"
            },
            "sort": [
               3
            ]
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "4",
            "_score": null,
            "_source": {
               "name": "Jose"
            },
            "sort": [
               4
            ]
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": null,
            "_source": {
               "name": "Jeff"
            },
            "sort": [
               4
            ]
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "3",
            "_score": null,
            "_source": {
               "name": "Darlene"
            },
            "sort": [
               7
            ]
         }
      ]
   }
}

To filter by length, I can use a script filter in a similar way:

POST /test_index/_search
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "script": {
               "script": "doc['name'].value.length() > 3",
               "params": {}
            }
         }
      }
   },
   "sort": {
      "_script": {
         "script": "doc['name'].value.length()",
         "type": "number",
         "order": "asc"
      }
   }
}
...
{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": null,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "4",
            "_score": null,
            "_source": {
               "name": "Jose"
            },
            "sort": [
               4
            ]
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": null,
            "_source": {
               "name": "Jeff"
            },
            "sort": [
               4
            ]
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "3",
            "_score": null,
            "_source": {
               "name": "Darlene"
            },
            "sort": [
               7
            ]
         }
      ]
   }
}

Here's the code I used:

http://sense.qbox.io/gist/22fef6dc5453eaaae3be5fb7609663cc77c43dab

P.S.: If any of the last names will contain spaces, you might want to use "index": "not_analyzed" on that field.

Sloan Ahrens
  • 8,588
  • 2
  • 29
  • 31
  • 3
    Let's say he performs this search often, on a lot of documents, would it be worth just indexing the length? – Robin Apr 21 '15 at 20:52
  • Thanks for such great feedback. @Robin: for the most part my data won't change so indexing the length I feel would be beneficial. If you have any references you can point me to that would be great. – mvcNewbie Apr 21 '15 at 20:55
  • Hit the enter button too soon. I also ran into a problem using Sense where it throws exception stating that groovy is disabled. Specifying groovy as the language and changing removing the underscore on _script then trickles me down a path of more errors. I know this is trivial, any suggestions on an environment to run your example? I tried the link you sent to qBox but it took me to an empty editor, maybe you forgot to save the code? – mvcNewbie Apr 21 '15 at 20:57
  • I ran this against ES 1.3.4. With 1.4+ you'll have to enable CORS to use the Sense app, and dynamic scripting as well. Indexing the length is probably a better production solution, though. You'll have to calculate the lengths in your code, I think. Not sure why the editor is empty for you, though; it works for me. What browser are you using? Do you see any JavaScript errors in the console? Could you post them somewhere for me? – Sloan Ahrens Apr 21 '15 at 21:02
  • Thanks Sloan, it could be the firewall I'm behind, I'll give it another go later tonight – mvcNewbie Apr 21 '15 at 21:03
  • Yeah, the firewall could be the issue. It has to make an AJAX request here to get the code: http://2694d39446e53877000.qbox.io/sense/gist/22fef6dc5453eaaae3be5fb7609663cc77c43dab – Sloan Ahrens Apr 21 '15 at 21:05
  • Enabling dynamic scripts was the key, thanks for the tip Sloan. I added the following data to the existing example: {"index":{"_id":5}} {"name":"Bob Smith"} {"index":{"_id":6}} {"name":"Jeff Husenburger"} {"index":{"_id":7}} {"name":"Darlene Si"} {"index":{"_id":8}} {"name":"Jose Whatchamcalit"} as you suggested since I have spaces, I also added the not_analyzed parameter so my query looks like this: – mvcNewbie Apr 21 '15 at 21:36
  • POST /test_index/_search { "query": { "match_all": {} }, "sort": { "_script": { "script": "doc['name'].value.length()", "type": "number", "order": "asc", "index": "not_analyzed" } } } – mvcNewbie Apr 21 '15 at 21:36
  • sorry about the multiple comments. It looks like with the space in there it throws everything off, the "index" : "not_analyzed" didn't seem to help unless I didn't utilize it correctly. – mvcNewbie Apr 21 '15 at 21:37
  • `"index": "not_analyzed"` goes in your mapping, not in your query. You can set it in the field itself, or in a sub-field, as described here: http://stackoverflow.com/questions/29614818/elastic-search-allow-user-to-optionally-use-an-exact-match, but it has to be done when you define the index (or at least, before you index documents with that field). – Sloan Ahrens Apr 21 '15 at 21:53