5

I have some customer documents that I want to be retrieved using ElasticSearch based on where the customers come from (country field is IN an array of countries).

[
  {
    "name": "A1",
    "address": {
      "street": "1 Downing Street"
      "country": {
        "code": "GB",
        "name": "United Kingdom"
      }
    }
  },
  {
    "name": "A2",
    "address": {
      "street": "25 Gormut Street"
      "country": {
        "code": "FR",
        "name": "France"
      }
    }
  },
  {
    "name": "A3",
    "address": {
      "street": "Bonjour Street"
      "country": {
        "code": "FR",
        "name": "France"
      }
    }
  }
]

Now, I have another an array in my Python code:

["DE", "FR", "IT"]

I'd like to obtain the two documents, A2 and A3.

How would I write this in PyES/Query DSL? Am I supposed to be using an ExistsFilter or a TermQuery for this. ExistsFilter seems to only check whether the field exists or not, but doesn't care about the value.

Mark
  • 2,137
  • 4
  • 27
  • 42

1 Answers1

4

In NoSQL-type document stores, all you get back is the document, not parts of the document.

Your requirement: "I'd like to obtain the two documents, A2 and A3." implies that you need to index each of those documents separately, not as an array inside another "parent" document.

If you need to match values of the parent document alongside country then you need to denormalize your data and store those values from the parent doc inside each sub-doc as well.

Once you've done the above, then the query is easy. I'm assuming that the country field is mapped as:

country: { type: "string", index: "not_analyzed" }

To find docs with DE, you can do:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1'  -d '
{
   "query" : {
      "constant_score" : {
         "filter" : {
            "term" : {
               "country" : "DE"
            }
         }
      }
   }
}
'

To find docs with either DE or FR:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1'  -d '
{
   "query" : {
      "constant_score" : {
         "filter" : {
            "terms" : {
               "country" : [
                  "DE",
                  "FR"
               ]
            }
         }
      }
   }
}
'

To combine the above with some other query terms:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1'  -d '
{
   "query" : {
      "filtered" : {
         "filter" : {
            "terms" : {
               "country" : [
                  "DE",
                  "FR"
               ]
            }
         },
         "query" : {
            "text" : {
               "address.street" : "bonjour"
            }
         }
      }
   }
}
'

Also see this answer for an explanation of how arrays of objects can be tricky, because of the way they are flattened:

Is it possible to sort nested documents in ElasticSearch?

Community
  • 1
  • 1
DrTech
  • 17,031
  • 5
  • 54
  • 48
  • Oh, okay...maybe I wasn't clear with my explanation. The provided snippet was actually what I got back from a query through elastic search. I just wanted to further refine the query by filtering by country. Is it always required to make your own custom mapping? I'm currently just relying on the default dynamic mapping that ElasticSearch uses to index my documents. – Mark Aug 19 '12 at 10:00
  • 1
    OK right - that wasn't clear from the question. Regarding making a custom mapping, no you don't have to create one, but I strongly recommend that you do. That way you know exactly what is happening - you're not relying on heuristics. Your country field is currently being analyzed as full text, which it isn't - it's an enum. That may make little difference in this case, but may result in strange behaviour in other fields later on (depending on their content). Creating your mapping is easy, and a good habit to adopt. – DrTech Aug 26 '12 at 09:21
  • Hmm...thanks for your reply. I guess I just can't be lazy. Reason being my couchdb documents are crazy large nested structures....mapping everything is gonna be a chore, and I have 9 different types of documents. – Mark Aug 28 '12 at 09:10