0

We have an elastic search document which has a dimension called city. Each document will have only one value for city field. I have a scenario where I need to query the person based on the city or cities.

Documents in Elasticsearch

{
  person_id: "1",
  property_value : 25000,
  city: "Bangalore"
}
{
  person_id: "2",
  property_value : 100000,
  city: "Bangalore"
}
{
  person_id: "1",
  property_value : 15000,
  city: "Delhi"
}

Note: The aggregation should be performed on property_value and group by on person_id.

For eg.,

  1. If I query for Bangalore it should return document with person_id 1 and 2.
  2. If I query for both Delhi and Bangalore it should return this

    { person_id: "1", property_value : 40000, city: ["Bangalore", "Delhi"] }

Garvit Khamesra
  • 375
  • 2
  • 9

1 Answers1

1

Looking at your data, I've come up with a sample mapping, request query and the response.

Mapping:

PUT my_index_city
{
  "mappings": {
    "properties": {
      "person_id":{
        "type": "keyword"
      },
      "city":{
        "type":"text",
        "fields":{
          "keyword":{       
            "type": "keyword"
          }
        }
      },
      "property_value":{
        "type": "long"
      }
    }
  }
}

Sample Request:

Note that I've made use of simple query string to filter the documents having Bangalore and Delhi.

For aggregation I've made use of Terms Aggregation on person_id and Sum Aggregation on the property_value field.

POST my_index_city/_search
{
  "size": 0, 
  "query": {
    "query_string": {
      "default_field": "city",
      "query": "Bangalore Delhi"
    }
  },
  "aggs": {
    "my_person": {
      "terms": {
        "field": "person_id",
        "size": 10,
        "min_doc_count": 2
      },
      "aggs": {
        "sum_property_value": {
          "sum": {
            "field": "property_value"
          }
        }
      }
    }
  }
}

Sample Response:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "my_person" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1",
          "doc_count" : 2,
          "sum_property_value" : {
            "value" : 40000.0
          }
        }
      ]
    }
  }
}

Note: This query would only work if the person_id has multiple documents but each document having unique/different city value.

What I mean to say is, if the person_id has multiple documents with same city, the aggregation would not give right answer.

Updated Answer:

There is no direct way to achieve what you are looking for unless you modify the mapping. What I've done is, made use of nested datatype and ingested all the documents for person_id as a single document.

Mapping:

PUT my_sample_city_index
{
  "mappings": {
    "properties": {
      "person_id":{
        "type": "keyword"
      },
      "property_details":{
        "type":"nested",               <------ Note this
        "properties": {
          "city":{
            "type": "text",
            "fields":{
              "keyword":{
                "type":"keyword"
              }
            }
          },
          "property_value":{
            "type": "long"
          }
        }
      }
    }
  }
}

Sample Documents:

POST my_sample_city_index/_doc/1
{
  "person_id": "1",
  "property_details":[
    {
      "property_value" : 25000,
      "city": "Bangalore"
    },
    {
      "property_value" : 15000,
      "city": "Delhi"
    }
    ]
}

POST my_sample_city_index/_doc/2
{
  "person_id": "2",
  "property_details":[
    {
      "property_value" : 100000,
      "city": "Bangalore"
    }
    ]
}

Aggregation Query:

POST my_sample_city_index/_search
{
  "size": 0,
  "query": {
    "nested": {
      "path": "property_details",
      "query": {
        "query_string": {
          "default_field": "property_details.city",
          "query": "bangalore delhi"
        }
      }
    }
  },
  "aggs": {
    "persons": {
      "terms": {
        "field": "person_id",
        "size": 10
      },
      "aggs": {
        "property_sum": {
          "nested": {                          <------ Note this
            "path": "property_details"
          },
          "aggs": {
            "total_sum": {
              "sum": {
                "field": "property_details.property_value"
              }
            }
          }
        }
      }
    }
  }
}

Note that I've applied initially a term query on person_id post which I've applied Nested Aggregation, further on which I've applied metric sum aggregation query.

This should also work correctly if a person has multiple properties in the same city.

Response:

{
  "took" : 31,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "persons" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1",
          "doc_count" : 1,
          "property_sum" : {
            "doc_count" : 2,
            "total_sum" : {
              "value" : 40000.0
            }
          }
        },
        {
          "key" : "2",
          "doc_count" : 1,
          "property_sum" : {
            "doc_count" : 1,
            "total_sum" : {
              "value" : 100000.0
            }
          }
        }
      ]
    }
  }
}

Let me know if this helps!

Kamal Kunjapur
  • 8,547
  • 2
  • 22
  • 32