2

I have a string field in my document. Now I need to sort my documents based on the word counts of that field. How do I accomplish that in elasticsearch?

3 Answers3

3

The best approach to this would be to use the token count type. But then we need to make sure that we are not disrupting the orginal string. For this , we need to use multi field and add additional field to keep track of the tokens alone.

Now a mapping like below should work best for us

{
    "tweet" : {
        "properties" : {
            "name" : {
                "type" : "multi_field",
                "fields" : {
                    "wordCount" : {"type" : "token_count"},
                }
            }
        }
    }
}
Vineeth Mohan
  • 18,633
  • 8
  • 63
  • 77
  • Can you please answer this https://stackoverflow.com/questions/51590179/search-part-of-string-with-elasticseach?noredirect=1#comment90147346_51590534 – Vidya L Jul 30 '18 at 10:04
0

Use term aggregation like as :

curl -H GET http://loclahost:9200/index name/_search?pretty=1 -d' 
    {
        "aggs": {
            "genders": {
                "terms": {
                    "field": "gender"
                }
            }
        }
    }'

Note : for curl command check this

Here search for field gender and get result of all gender in aggregation bucket and default result is sorted order.

Neo-coder
  • 7,715
  • 4
  • 33
  • 52
  • This works for single-word fields but fails when multiple tokens are present as each token is counted separately. `Hello world`, `Hello my name is dave` -> `Hello` x 2, `name` x 1, `dave` x 1, `world` x 1 (`my` and `is` may or may not be stripped out depending on the analyzer you use). – Basic May 06 '15 at 16:15
0

Your best bet is to store the token count alongside the original field. See the documentation in the Core Types here: http://www.elastic.co/guide/en/elasticsearch/reference/1.4/mapping-core-types.html#token_count

Then you would sort by field.word_count (where field is the 'parent' property).

samjudson
  • 56,243
  • 7
  • 59
  • 69