41

Our application needs 5 collections in a db. When we add clients to our application we would like to maintain separate db for each customer. For example, if we have 500 customers, we would have 500 dbs and 2500 collections (each db has 5 collection). This way we can separate each customer data. My concern is, will it lead to any performance problems?

UPDATE: Also follow this google-group discussion.

Community
  • 1
  • 1
user10
  • 5,186
  • 8
  • 43
  • 64
  • On a high number of customers you might be better off seeking isloated instances instead, but that depends on more than what you have given – Sammaye Jun 04 '13 at 12:21
  • 2
    that's what shards are for - they are each a separate mongod instance/replica set. this is an extremely common configuration for a lot of users of MongoDB who are hosting multiple tenants and separate DBs is a standard answer here. – Asya Kamsky Jun 04 '13 at 23:13

1 Answers1

55

Our application needs 5 collections in a db. When we add clients to our application we would like to maintain separate db for each customer. For example, if we have 500 customers, we would have 500 dbs and 2500 collections (each db has 5 collection). This way we can separate each customer data.

That's a great idea. On top of logical separation this will provide for you, you will also be able to use database level security in MongoDB to help prevent inadvertent access to other customers' data.

My concern is, will it lead to any performance problems?

No, and in fact it will help as with database level lock extremely heavy lock contention for one customer (if that's possible in your scenario) would not affect performance for another customer (it still might if they are competing for the same I/O bandwidth but if you use --directoryperdb option then you have the ability to place those DBs on separate physical devices.

Sharding will also allow easy scaling as you won't even have to partition any collections - you can just round-robin databases across multiple shards to allow the load to be distributed to separate clusters (if and when you reach that level).

Contrary to the claim in the other answer, TTLMonitor thread does NOT pull documents into RAM unless they are being deleted (and added to the free list). They work off of TTL indexes both to tell if any documents are to be expired as well as to located the document directly.

I would strongly recommend against the one database many collections solution as that doesn't allow you to either partition the load, nor provide security, nor is it any easier to handle on the application side.

Asya Kamsky
  • 41,784
  • 5
  • 109
  • 133
  • 1
    +1 You probably have loads more experience than I do, though I still think that proper answer demands more information about the situation. – Maiku Mori Jun 05 '13 at 11:26
  • 1
    I agree - for the most part, but I think there is enough basic information given that once there are a lot more databases they can shard with less complexity than when they have to split collection across many shards. In other words their use case will easily scale horizontally which makes splitting on db a safe option. – Asya Kamsky Jun 05 '13 at 12:49
  • I think this makes sense in OP's scenario, which I believe is few users but relatively high document count per user, but if OP were to scale past a couple thousand customers (lets say 1 million) would this still be a good approach? – nick Oct 12 '15 at 03:23
  • this would not likely be the right strategy if customer is more like "user" - if you have millions of customers, it's no different than having millions of users - you likely won't have separate collections or DBs for each of them. – Asya Kamsky Oct 13 '15 at 01:23
  • What solution do you mean if you speak of "the one database many collections solution" in your last sentence? To my mind the alternative to the "500 dbs/5 collections each-->2500 collections"-solution is the "1 db/5 collections"-solution. Isn't it? – heinob Mar 15 '18 at 12:07
  • 3
    Can anyone confirm if this still holds true in 2020? – holydragon Jun 18 '20 at 06:25
  • It does, though not to arbitrary number of collections - beyond tens of thousands of collections the number of files on disk doesn’t really scale well. – Asya Kamsky Jun 22 '20 at 06:09