3

I am designing a system which has a database for storing users and information related to the users. More specifically each user in the table has very little information. Something like Name, Password, uid.

Then each user has zero or more containers, and the way I've initially done this is to create a second table in the database which holds containers and have a field referencing the user owning it. So something like containerName, content, owner.

So a query on data from a container would look something like:

SELECT content
  FROM containers
 WHERE (containerName='someContainer' AND owner='someOwner');

My question is if this is a good way, I am thinking scalability say that we have thousands of users with say... 5 containers each (however each user could have a different number of containers, but 5 would probably be a typical case). My concern is that searching through the database will become slow when there is 5 entries out of 5*1000 entries I could ever want in one query. (We may typically only want a specific container's content from our query and we are looking into the database with basically a overhead of 4995 entries, am I right? And what happen if I subscribed a million users, it would become a huge table which just intuitively feel like a bad idea.

A second take on it which I had would be to have tables per user, however that doesn't feel like a very good solution either since that would give me 1000 tables in the database which (also by intuition) seem like a bad way to do it.

Any help in understanding how to design this would be greatly appreciated, I hope it's all clear and easy to follow.

vyegorov
  • 21,787
  • 7
  • 59
  • 73
qrikko
  • 2,483
  • 2
  • 22
  • 35
  • Will you have all containers being unique? Or will it be some 20-30 containers shared across all users? – vyegorov May 18 '12 at 07:55
  • Hmm.. Every user-container pair is unique but different users may have an instance of "the same container". So in effect the content of userA-containerA differ from the content of userB-containerA, make sense? – qrikko May 18 '12 at 08:00
  • Then you should create a separate table for containers, use `container_id` in the `contents` table and join 3 tables. This will perform much faster and will occupy less space. – vyegorov May 18 '12 at 08:07
  • 1
    I did a bit of struggling to understand that but I think I see the way you intend for it to work. And it makes sense. So basically the content-lookup would reduce duplication of containers, is that right? – qrikko May 18 '12 at 08:21
  • 1
    Correct, right now your design [is not in 2NF](http://stackoverflow.com/a/724032/1154462). – vyegorov May 18 '12 at 08:33
  • Thanks, that link was very informative as well. Helped me put the pieces together I believe. – qrikko May 18 '12 at 08:38

2 Answers2

0

The accepted way of handling this is by creating an INDEX on the owner field. That way, MySQL optimized queries for owner = 'some value' conditions.

See also: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

You're right in saying that a 1000 tables is not scalable. Once you start reaching a few million records you might want to consider doing sharding (split up records into several locations based on user attributes) ... but by that time you'd already be quite successful I think ;-)

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
0

If it is an RBMS(like Oracle / MySQL) datbase, you can create indexes on columns that are frequently queried to optimize the table traversal and query. Indexes are automatically created for PRIMARY and (optionally for) FOREIGN keys.

Ahamed Mustafa M
  • 3,069
  • 1
  • 24
  • 34