8

On my Elasticsearch server I have three indices: Person, Archive and Document.

  • Each document has a archive field which is the _id of the Archive it is in.

  • Each archive has a owner which is the _id of the Person that is the owner of the archive.

With the indices above I can aggregate documents into buckets of archives and archives into buckets of owners.

How can I also include the documents in the person aggregations so if I filter on a specific person I get the archives and their documents that belongs to the person instead of only the archives?


This is what I have so far to filter and aggregate the archives into buckets of owners:

{
  "post_filter": {
    "terms": {
      "owner": [
        "my_owner_id"
      ]
    }
  },
  "aggs": {
    "_filter_archive": {
      "filter": {
        "terms": {
          "owner": [
            "my_owner_id"
          ]
        }
      },
      "aggs": {
        "archive": {
          "terms": {
            "field": "archive"
          }
        }
      }
    }
  }
}
Oskar Persson
  • 6,605
  • 15
  • 63
  • 124
  • You have modeled your Elasticsearch indices from the relational databases thinking and approach, which is most of the times wrong. With the current configuration, you cannot achieve what you want. – Andrei Stefan Jan 14 '18 at 15:42
  • @AndreiStefan Do you have a recommendation for a functional Elasticsearch model? – Oskar Persson Jan 14 '18 at 15:43
  • If possible, denormalize your data and have everything in a single index. What ES version are you using? – Andrei Stefan Jan 14 '18 at 15:44
  • I'm using ES 6.1 – Oskar Persson Jan 14 '18 at 15:45
  • How many documents an archive can have and how many archives a person can have? – Andrei Stefan Jan 14 '18 at 15:46
  • There are no limits on any of them – Oskar Persson Jan 14 '18 at 15:48
  • I meant realistically, not theoretically. – Andrei Stefan Jan 14 '18 at 15:48
  • I would guess hundreds of archives and hundreds of thousands of documents – Oskar Persson Jan 14 '18 at 16:04
  • How often a Person will be updated? The same question for Archive? – Andrei Stefan Jan 14 '18 at 16:06
  • I would guess weekly for both – Oskar Persson Jan 14 '18 at 16:14
  • It would be good if you could think about these requirements a bit better. This sort of things matter in a scenario like this. What operations are possible for an archive (add another document, remove a document etc)? And here I refer to a realistic/practical scenario, so would be good if the product owner/manager would have this information. – Andrei Stefan Jan 14 '18 at 16:17
  • This type of aggregation that you need - is it something you'd use frequently or very rarely? Do you need those relationships for other queries/aggregations? If not maybe you could create two queries or multiple queries to achieve what you need and don't have to rely on ES relationships. – Andrei Stefan Jan 14 '18 at 16:34
  • Only one query and pretty often – Oskar Persson Jan 14 '18 at 16:35
  • Will this query results be displayed in a web page or similar? What is the point of showing one Person with an Archive and 10000 Documents? Is this even useable? Can you show list of Persons with each own person's Archive and then, clicking on an Archive the list of documents expands? – Andrei Stefan Jan 14 '18 at 16:37
  • The query will only display the documents but I want the facet filter for persons to filter the documents and not only the archives – Oskar Persson Jan 14 '18 at 17:12
  • 2
    This will be difficult to answer because it seems you are missing some details. The easy answer is: use `nested` documents or parent-child relationship. Which one to use in your case depends on a lot of factors. My suggestion is to try them both and test. See how well they perform. The third option is to denormalize your data completely. That's the reason I asked about updates, how frequent they are, how large a Person document is, how large an Archive document is etc. If you are not prepared to answer these questions, then test `nested` and parent-child and choose one or the other. Good luck! – Andrei Stefan Jan 15 '18 at 07:36

1 Answers1

2

This will be difficult to answer because it seems you are missing some details. The easy answer is: use nested documents or parent-child relationship. Which one to use in your case depends on a lot of factors. My suggestion is to try them both and test. See how well they perform. The third option is to denormalize your data completely. That's the reason I asked about updates, how frequent they are, how large a Person document is, how large an Archive document is etc. If you are not prepared to answer these questions, then test nested and parent-child and choose one or the other. Good luck!

Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89