0

I'm trying to get all the documents with highest field value (+ conditional term filter)

Given the Employees mapping

Name     Department     Salary
----------------------------
Tomcat   Dev             100
Bobcat   QA               90
Beast    QA              100 
Tom      Dev             100
Bob      Dev              90

In SQL it would look like

select * from Employees  where Salary = select max(salary) from Employees 

expected output

Name     Department     Salary
----------------------------
Tomcat   Dev             100
Beast    QA              100 
Tom      Dev             100

and

select * from Employees  where Salary = (select max(salary) from Employees where Department ='Dev'  )

expected output

Name     Department     Salary
----------------------------
Tomcat   Dev             100
Tom      Dev             100

Is it possible with Elasticsearch ?

Alexander.Furer
  • 1,817
  • 1
  • 16
  • 24

1 Answers1

0

The below should help:

Looking at your data, note that I've come up with the below mapping:

Mapping:

PUT my-salary-index
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "department":{
        "type": "keyword"
      },
      "salary":{
        "type": "float"
      }
    }
  }
}

Sample Documents:

POST my-salary-index/_doc/1
{
  "name": "Tomcat",
  "department": "Dev",
  "salary": 100
}

POST my-salary-index/_doc/2
{
  "name": "Bobcast",
  "department": "QA",
  "salary": 90
}

POST my-salary-index/_doc/3
{
  "name": "Beast",
  "department": "QA",
  "salary": 100
}

POST my-salary-index/_doc/4
{
  "name": "Tom",
  "department": "Dev",
  "salary": 100
}

POST my-salary-index/_doc/5
{
  "name": "Bob",
  "department": "Dev",
  "salary": 90
}

Solutions:

Scenario 1: Return all employees with max salary

POST my-salary-index/_search
{
  "size": 0,
  "aggs": {
    "my_employees_salary":{
      "terms": {
        "field": "salary",
        "size": 1,                   <--- Note this
        "order": {
          "_key": "desc"
        }
      },
      "aggs": {
        "my_employees": {
          "top_hits": {             <--- Note this. Top hits aggregation
            "size": 10
          }
        }
      }
    }
  }
}

Note that I've made use of Terms Aggregation with Top Hits aggregation chained to it. I'd suggest to go through the links to understand both the aggregations.

So basically you just need to retrieve the first element in the Terms Aggregation that is why I've mentioned the size: 1. Also note the order, just in case if you requirement to retrieve the lowest.

Scenario 1 Response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_employees" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 2,
      "buckets" : [
        {
          "key" : 100.0,
          "doc_count" : 3,
          "employees" : {
            "hits" : {
              "total" : {
                "value" : 3,
                "relation" : "eq"
              },
              "max_score" : 1.0,
              "hits" : [
                {
                  "_index" : "my-salary-index",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_score" : 1.0,
                  "_source" : {
                    "name" : "Tomcat",
                    "department" : "Dev",
                    "salary" : 100
                  }
                },
                {
                  "_index" : "my-salary-index",
                  "_type" : "_doc",
                  "_id" : "3",
                  "_score" : 1.0,
                  "_source" : {
                    "name" : "Beast",
                    "department" : "QA",
                    "salary" : 100
                  }
                },
                {
                  "_index" : "my-salary-index",
                  "_type" : "_doc",
                  "_id" : "4",
                  "_score" : 1.0,
                  "_source" : {
                    "name" : "Tom",
                    "department" : "Dev",
                    "salary" : 100
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Scenario 2: Return all employee with max salary from particular department

POST my-salary-index/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "department": "Dev"
          }
        }
      ]
    }
  }, 
  "aggs": {
    "my_employees_salary":{
      "terms": {
        "field": "salary",
        "size": 1,
        "order": {
          "_key": "desc"
        }
      },
      "aggs": {
        "my_employees": {
          "top_hits": {
            "size": 10
          }
        }
      }
    }
  }
}

For this, there are many ways to do this, but the idea is that you basically filter the documents before you apply aggregation on top of it. That way it would be more efficient. Note that I'v just added a bool condition to the aggregation query mentioned in solution for Scenario 1.

Scenario 2 Response

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_employees_salary" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 1,
      "buckets" : [
        {
          "key" : 100.0,
          "doc_count" : 2,
          "my_employees" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : 0.53899646,
              "hits" : [
                {
                  "_index" : "my-salary-index",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_score" : 0.53899646,
                  "_source" : {
                    "name" : "Tomcat",
                    "department" : "Dev",
                    "salary" : 100
                  }
                },
                {
                  "_index" : "my-salary-index",
                  "_type" : "_doc",
                  "_id" : "4",
                  "_score" : 0.53899646,
                  "_source" : {
                    "name" : "Tom",
                    "department" : "Dev",
                    "salary" : 100
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

You can also think of making use of SQL Access if you have complete xpack or rather licensed version of x-pack.

Hope this helps.

Kamal Kunjapur
  • 8,547
  • 2
  • 22
  • 32