1

DB design 1: There is 1 table


     Create Table (id int primary key, name varchar(20), description varchar(10000));

DB design 2: There are 2 tables


       Create Table1 (id int primary key, name varchar(20));
       Create Table2 (id int primary key, description varchar(10000));

Note: each id must have a description associated with it. We don't query the description so often like name.

In the design 1, 1 simple query can get name & description, no need join but what if we have 1 million records, then will it be slow?

In the design 2, we need join so the database needs some searching & matching id --> this could be slow, but we don't query description often so it will be slow for sometime not all time.

So what is the better design in this scenario?

Kiti
  • 543
  • 7
  • 18
  • This is not answerable. The only way to find out is to measure performance using _your_ dataset and access patterns. The answer will depend a lot on many factors such as I/O bandwidth, processor speed, memory, disk speed, etc. You haven't even said which database you will be using, and there is no one invariant answer to this question. – Jim Garrison Sep 07 '13 at 03:15
  • That doesn't change the fact that my answer for my data and access pattern will likely be different from yours. You are going to have to try both out with real-world (_your_ real-world) data to decide. Nobody here can tell you which is better. – Jim Garrison Sep 07 '13 at 03:39

2 Answers2

2

That's called vertical partitioning or "row splitting" and is no silver bullet (nothing is). You are not getting "better performance" you are just getting "different performance". Whether one set of performance characteristics is better than the other is a matter of engineering tradeoff and varies from one case to another.

In your case, 1 million rows will fit comfortably into DBMS cache on today's hardware, producing excellent performance. So unless some of the other reasons apply, keep it simple, in a single table.

And if its 1 billion rows (or 1 trillion or whatever number is too large for the memory standards of the day), keep in mind that if you have indexed your data correctly, the performance will remain excellent long after it became bigger than the cache.

Only in the most extreme of cases will you need to vertically partition the table for performance reasons - in which case you'll have to measure in your own environment with your own access patterns, and determine if it brings any performance benefit at all; and is it large enough to make up for the increased JOINing.

Community
  • 1
  • 1
Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
0

That's over-optimization for 1 million records in my opinion. It's really not that much. You could try to test the actual performance by generating dummy data on about a million rows for a dummy database and query it. You'll see how it performs.

ApplePie
  • 8,814
  • 5
  • 39
  • 60
  • what about 100 million records, also we don't query the "description" as so often as the "name" – Kiti Sep 07 '13 at 02:46
  • At some point you have to test it yourself to measure the performance. There is a big difference between 1M and 100M rows. Also, what is "too long" is subjective. You know best what will take too much time. Your best bet is to generate dummy (but with similar proprties) data and try a bunch of possible queries and time their performance. There is no 'one' answer here. – ApplePie Sep 07 '13 at 04:00