2

Our current software solution uses a local ES installation (1 cluster and 1 node) to store documents so then later the user is able to search them. The ingest of nodes is not continuously done but let's say once a month by using bulks. The document set isn't huge and the size of documents is small. This solution has been working correctly without problems in normal laptop PCs (i5 with 8Gb RAM) since the use case does not require big performance.

Now we're facing 2 new requirements for our software solution:

  1. Should be branded for other customers
  2. The same final user (using the same machine) should be able to work with several instances of our solution (from different customers)

With these 2 new requirement the current solution cannot be used because all documents would be indexed in the same node using the same index. Further searches would show document from different customers.

A first approach to solve this issue was to index documents based on customer, that is, to create indices per customer and index/search documents on the corresponding index. However, we're thinking on another solution that allows us the following:

  • ES indexed information must be easily removed from the system (i.e. by removing the data folder)
  • Each customer may want to use a newer version of our solution (i.e. which uses ES 7) whereas other will remain with older versions (i.e. ES 6)

Based on this, I think that the solution would be to have several ES installations on the same PC, each one with its customer dependent configuration:

  • Different cluster
  • Different node name and port
  • Different ES version

My questions then would be, has anyone faced a similar use case? Would it be performance issues by installing several ES an let their services running continuously at the same time? Which possible problems could arise of having this configuration?

Any help would be appreciated.


UPDATE

Based on the answer received and for possible future answers, I would like to clarify a bit more about the architecture of our solution + ES:

  • Our solution is a desktop application executed on normal laptop PCs
  • Single user
  • Even if more than one customer specific solution is installed in the PC, only 1 will be active at a time
  • Searches will be executed sporadically when the user wants to search for a specific document (as if someone opens Wikipedia to search for an article)

So topics as ...

  • Infrastructure failure
  • Data replication
  • Performance at high search demand

... are not critical

1 Answers1

2

You can run the multiple installations of ES in the same machine in production but it has a lot of disadvantages.

  1. Ideally, you should have at least 1 replica of your shard and it should present in another physical machine(node) so that in case of infrastructure failure, it can recover, this is done to improve the resiliency of your system.

  2. In production, it's common to come across a use case, where having single shard is not enough and you need to break your index into multiple primary shards to make it horizontal scalable but if you just use 1 physical server then having multiple shards will not help you.

  3. Having multiple installations also doesn't help in the case where there is a lot of traffic in one installation and it consumes all the physical resources like RAM, CPU, disk and brings down all the installations also down in production.it also becomes difficult to isolate the root cause and quickly fix the issue as ES installation is not stateless and you can not just start the same installation on another machine, without moving all its data and configuration.

Basically, yours is a truly tenant-based SAAS application and by looking into your requirement, you should design your system considering below:

  1. Upgrading the ES version sometimes is not very straightforward and it involves a lot of breaking changes in your application code as well, having just a cluster running with the latest version will not solve the problem. Hence your application should expose the tenant(your customer) registration API which Also takes which version of ES customer wants to use and accordingly your code handles that.
  2. ES indexed information must be easily removed from the system :- I didn't get what the issue here, you can simply delete it using the ES API which is the recommended way of doing that, instead of doing it manually.

Hope my answer is clear to you and let me know if I missed any of your requirement and you need further clarification.

Based on the update on the question I am adding below points:

  1. As OP mentioned its a very small desktop application and not a server-side application, then it's very important to not mix and store the content of each customer. Anybody can install the ES web admin plugin like https://github.com/lmenezes/cerebro and read the data of other customers.

  2. The best solution in your case to have a single installation of ES based on the version specified by the customer and have just 1 index pertaining to the customer running the desktop application. And you can easily use the delete API as I mentioned earlier.

There is no need to have multiple installations at all, even though they won't be active but still, they consume the local disk space(which is even more important in case of desktop app) and can cause this and this issue and its not at all cleaner design to store the unnecessary information on desktop app and also cause a security issue which is much bigger concerns in general.

Amit
  • 30,756
  • 6
  • 57
  • 88
  • 1
    Thanks a lot for taking time to answer. I understand the disadvantages of what you say regarding the scalability. However, our use case is very "simple" and those topics are not critical for us. I've updated the question with some clarifications on how the architecture looks like. – Alejandro González Jul 15 '19 at 10:45
  • @AlejandroGonzález, thanks for the clarification, but apart from ES version, what are the other design choices for you to not consider the different indexes for customers – Amit Jul 15 '19 at 10:51
  • the other reason was to be able to delete the information from ES by directly removing the contents of the data folder without using any API. AFAIK, in the case that the information is mixed, it's not easy to know what to remove. – Alejandro González Jul 15 '19 at 10:58
  • @AlejandroGonzález, sorry for the delay in response but when you have a seprate index for each customer, then you can easily delete the customer index using the delete API which I mentioned earlier, ES creates separate shard(which is physically stored in different files) for each index, hence your information isn't mixed. – Amit Jul 16 '19 at 03:38
  • I still don't see how does a single installation could face the problem of requiring different ES versions. Imagine customer A using software version 1 which requires ES 6 and customer B using software version 2 which requires ES 7. Would it be customer A able to use ES properly when the ES libraries of version 1 are version 6? – Alejandro González Jul 16 '19 at 07:34
  • As yours is a desktop app then for customer A you install es 6 and for customer b you install es 7.. based on your customer requirement and I am assuming that this information you need to know anyway – Amit Jul 16 '19 at 08:19
  • But this would imply several installations on the same PC. One of the requirements was that applications from different customers could be installed on the same PC because the one user (external from the companies) could make use of them. – Alejandro González Jul 16 '19 at 10:45
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/196526/discussion-between-amit-khandelwal-and-alejandro-gonzalez). – Amit Jul 16 '19 at 12:09