41

We are planning to introduce Elastic search(AWS) for our Multi tenancy application. We have below options,

  1. Using One Index Per Tenant
  2. Using One Type Per Tenant
  3. All Tenants Share One Index with Custom routing

As per this blog https://www.elastic.co/blog/found-multi-tenancy the first option would give memory issue. But not clear about other options.

It seems if we are using the third option then there is no data segregation. Not sure about security.

I believe second option would be better option as data would be segregated.

Help me to identify best option to proceed elastic search with Multi tenancy.

Please note that we would leverage AWS infrastructure.

Jasmel Pc
  • 515
  • 1
  • 5
  • 16
Selvakumar Ponnusamy
  • 5,363
  • 7
  • 40
  • 78
  • What is a tenant in your context? – Val Jan 26 '17 at 07:00
  • 2
    Each client is considered as a Tenant. – Selvakumar Ponnusamy Jan 26 '17 at 07:38
  • 2
    Then the answer depends on how many tenants/clients we are talking (1-10, 10-100, 100-1000, ?) and the growth factor you're expecting, i.e. is the number of client stable or do you expect a x% increase within the next N months? When deciding which strategy to take, you need to think of tomorrow, not today. – Val Jan 26 '17 at 07:48
  • 2
    There is a 4th option that you haven't mentioned: All tenants share one *time-based* index with custom routing. That's the most flexible option when your client count will increase over time – Val Jan 26 '17 at 08:19
  • Is there any difference between third option and fourth option you are mentioning? Assume 10-1000 clients – Selvakumar Ponnusamy Jan 26 '17 at 17:05
  • Yes, because you can control the size on your indices. If you have a single index, then you'll have to live with it for the eternity and it will have to store everything for all your new clients. Whereas if you decide to have one index per month/year/you-name-it then you can ensure that your indices will not grow beyond an unmanageable limit – Val Jan 26 '17 at 17:07
  • Also I have one more problem that each client would have different custom fields and field types also different, So Im still thinking either TYPE per client or INDEX per client – Selvakumar Ponnusamy Jan 27 '17 at 08:54
  • If fields with the same names can have different types depending on clients, then yes you'd need to store those clients in different indices since two types in the same index cannot have fields with the same name and different types... – Val Jan 27 '17 at 17:57
  • 2
    hello @SelvakumarPonnusamy, I wanna know what approach you chose and we are also having questions, searching for past experience. I would appreciate if you can share your experience. Thanks. – Doston Jun 08 '20 at 09:00
  • I wonder if the memory issue is still relevant since this question and answer is 5 years old and I've read that in version 8.x of Elastic the memory overhead per shard has been significantly reduced – cah1r Aug 18 '22 at 11:34

3 Answers3

37

We are considering the same question right now, and the following set of articles by Elasticsearch was very helpful.

Start here: https://www.elastic.co/guide/en/elasticsearch/guide/current/scale.html

And read through each subsequent article until you hit this one: https://www.elastic.co/guide/en/elasticsearch/guide/current/finite-scale.html

The following two were very eye-opening for me:

https://www.elastic.co/guide/en/elasticsearch/guide/current/faking-it.html https://www.elastic.co/guide/en/elasticsearch/guide/current/one-big-user.html

The basic takeaway:

  • Alias per customer
  • Shard routing
  • Now you can have indexes for big customers, shared indexes for little customers, and they all appear to be separate indexes
jzheaux
  • 7,042
  • 3
  • 22
  • 36
  • 1
    Any way to manage automtically which customer has dedicated index by size? – Lior Goldemberg Feb 21 '19 at 17:37
  • 1
    You can take a look at curator. I'm not sure about the specific use case, but I've used it in the past to do several maintenance-type tasks. Also, the Elasticsearch API is pretty sophisticated. That said, the process of moving a customer from a shared index to a dedicated index with zero downtime is time-consuming - I'm not certain that I'd jump into having it be automated (unless I'm misunderstanding what you mean). – jzheaux Feb 21 '19 at 18:23
  • Indeed there is no one-size fits all in the real world. We found a similar advice here: https://pulse.support/blog/multi-tenancy-with-elasticsearch-and-opensearch-7f1571 – synhershko Oct 17 '22 at 19:59
11

This is a too important link not to be mentioned here: http://www.bigeng.io/elasticsearch-scaling-multitenant/

Good architecture dilemmas, and great performance analysis / reasoning.

tldr; they had index groups that are built around shard allocation filtering to segregate load across nodes in the cluster

Froyke
  • 1,115
  • 7
  • 13
2

To summarize all answers and articles,

  1. Use shared index using custom routing using alias

    1.1) Special case: Big client can have dedicated index, only if needed.

Reference:

Use cases => https://www.elastic.co/blog/found-multi-tenancy

How to do => https://www.elastic.co/guide/en/elasticsearch/guide/current/faking-it.html

Anonymous Creator
  • 2,968
  • 7
  • 31
  • 77