2

I trying to get the maximum value of document from the same name records. Forexample, I have 3 users, 2 of them have same name but different followers count, I wanted to return only 1 document from the 2 same with same name based on the maximum of followers_count.

{ id: 1, name: "John Greenwood", follower_count: 100 }
{ id: 2, name: "John Greenwood", follower_count: 200 }
{ id: 3, name: "John Underwood", follower_count: 300 }

So the result would be,

{ id: 2, name: "John Greenwood", follower_count: 200 }
{ id: 3, name: "John Underwood", follower_count: 300 }

From 2 same names, the one with the maximum followers wins and other single one will also come.

I have mapping as follow,

"users-development" : {
    "mappings" : {
      "user" : {
        "dynamic" : "false",
        "properties" : {
          "follower_count" : {
            "type" : "integer"
          },
          "name" : {
            "type" : "string",
            "fields" : {
              "exact" : {
                "type" : "string",
                "index" : "not_analyzed"
              }
            }
          },
        }
      }
    }

This is where I have been stucked from long,

         {
            query: {
              filtered: {
                filter: {
                  bool: {
                    must: [
                      { terms: { "name.exact": [ "John Greenwood", "John Underwood" ] } },
                    ]
                  }
                }
              }
            },

            aggs: {
              max_follower_count: { max: { field: 'follower_count' } }
            },

            size: 1000,
          }

Any suggestions please

Muhamamd Awais
  • 2,385
  • 14
  • 25
  • this is the only thing I've had time to find: http://stackoverflow.com/questions/18449703/elasticsearch-group-by-multiple-fields#18450596 will try to post back again with more specifics – Matthew Clark Feb 18 '16 at 20:08
  • Is that the full query? Do you get any error messages? Your trailing comma there at the end of `size:1000,` would cause an issue. I ran the query with your data, and did not have a problem otherwise. – IanGabes Feb 18 '16 at 20:14
  • @IanGabes no I don't get any error messages I am working in ruby – Muhamamd Awais Feb 18 '16 at 20:36

2 Answers2

3

Your question have a special tool in the elastic stack as a hammer for a head kkk. Are Aggregations, See the examples: First of all in your case you will need aggregate by full name including spaces, your name field need to be not_analyzed like this

`PUT /index
{
  "mappings": {
    "users" : {
      "properties" : {
        "name" : {
          "type" :    "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}`

Now your query will be like this one:

`POST /index/users/_search
{
   "aggs": {
      "users": {
         "terms": {
            "field": "name"
         },
         "aggs": {
            "followers": {
               "max": {
                  "field": "follower_count"
               }
            }
         }
      }
   }
}`

I just aggregated by name and used a max metric to get the higgest follower count.

The response will be like this:

`"aggregations": {
      "users": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "John Greenwood",
               "doc_count": 2,
               "followers": {
                  "value": 200
               }
            },
            {
               "key": "John Underwood",
               "doc_count": 1,
               "followers": {
                  "value": 300
               }
            }
         ]
      }
   }`

Hope that will be good for you. Use aggregations for all situations that you need aggregate data and get sum on values.

Waldemar Neto
  • 826
  • 6
  • 17
  • 1
    awesome seems like a good solution, What if I have another field where I wanted to give preference one true on false and if that preference is applied I don't wanted to apply the aggregation. Thoughts – Muhamamd Awais Feb 18 '16 at 21:40
  • @MuhamamdAwais probably this is the awnser of your question http://stackoverflow.com/questions/28050292/how-to-aggregate-boolean-values-in-elastic-search – Waldemar Neto Feb 18 '16 at 23:01
0

Ok, I think you are looking for something along these lines, using the terms aggregation

{
   "query": {
      "terms": { "name.exact": [ "John Greenwood", "John Underwood" ] }
   },
   "aggs": {
      "max_follower_count": {
         "terms": {
            "field":"name.exact"
         },
         "aggs":{
             "max_follow" : { "max" : { "field" : "follower_count" } }
         }
      }
   },
   "size": 1000
}

The terms aggregation will make a bucket for each unique value, from names.exact, which will only be those specified in your terms query. So we now have a bucket for both Johns, now we can use the max aggregation to count who has the most followers. The max aggregation will operate on each bucket in its parent aggregation.

Each of these unique terms will then have its max value of follower_count computed, and displayed in the bucket. Results look as follows:

... //query results of just the terms query up here
"aggregations": {
  "max_follower_count": {
     "doc_count_error_upper_bound": 0,
     "sum_other_doc_count": 0,
     "buckets": [
        {
           "key": "John Greenwood",
           "doc_count": 2,
           "max_follow": {
              "value": 200
           }
        },
        {
           "key": "John Underwood",
           "doc_count": 1,
           "max_follow": {
              "value": 300
           }
        }
     ]
  }
}

The terms aggregation comes with a few caveats with how it does the counting, and the documentation linked should be pretty clear on that.

IanGabes
  • 2,652
  • 1
  • 19
  • 27