159

So this is more of a design question.

I have one primary key (say the user's ID), and I have tons of information associated with that user.

Should I have multiple tables broken down into categories according to the information, or should I have just one table with many columns?

The way I used to do it was to have multiple tables, so say, one table for application usage data, one table for profile info, one table for back end tokens etc. to keep things looking organized.

Recently some one told me that it's better not to do it that way and having a table with lots of columns is fine. The thing is, all those columns have the same primary key.

I'm pretty new to database design so which approach is better and what are the pros and cons?

What's the conventional way of doing it?

Lee Mac
  • 15,615
  • 6
  • 32
  • 80
Xavier_Ex
  • 8,432
  • 11
  • 39
  • 55
  • For clarity, correct me if I'm wrong, but I think the "multiple tables" can be understood as link/associative table: https://en.wikipedia.org/wiki/Associative_entity – cellepo Sep 28 '16 at 16:10
  • 1
    Is this database needed for analytical purposes or for operational/transactional processing? – Alexander Radev Mar 08 '19 at 15:41
  • one conventional approach to relational database design is Entity Attribute Relationship modeling. Normative pattern is for each "entity" in the model (entity=a person, place, thing, concept or event, that can be uniquely identified and we need to store information about), each "entity" is implemented as a table, with a primary/unique key each single-valued "attribute" is implemented as a column in the entity table. sometimes there are reasons we want to split an entity table to multiple tables, typical reasons are implementation concerns, max row size, performance, contention, transaction, – spencer7593 Apr 15 '22 at 13:40

8 Answers8

145

Any time information is one-to-one (each user has one name and password), then it's probably better to have it one table, since it reduces the number of joins the database will need to do to retrieve results. I think some databases have a limit on the number of columns per table, but I wouldn't worry about it in normal cases, and you can always split it later if you need to.

If the data is one-to-many (each user has thousands of rows of usage info), then it should be split into separate tables to reduce duplicate data (duplicate data wastes storage space, cache space, and makes the database harder to maintain).

You might find the Wikipedia article on database normalization interesting, since it discusses the reasons for this in depth:

Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

Denormalization is also something to be aware of, because there are cases where repeating data is better (since it reduces the amount of work the database needs to do when reading data). I'd highly recommend making your data as normalized as possible to start out, and only denormalize if you're aware of performance problems in specific queries.

Brendan Long
  • 53,280
  • 21
  • 146
  • 188
  • 1
    Thanks for your answer, so after reading it I think what I was talking about was the one-to-one information situation, when a user has many one-to-one columns. – Xavier_Ex Mar 19 '12 at 17:29
  • @Xavier_Ex - Yeah, if there's only one column per user, then just one huge users table will be easier to work with (and a lot easier for the DB engine to optimize). – Brendan Long Mar 19 '12 at 17:34
  • Your edited post provides more helpful information! I have a new concern that if some of the columns will be frequently updated, should I put those in a separate tables? For example the date-of-birth of a user will not be updated ever, but the back end token may be invalidated after a period of time and will require frequent updates. Would it be better if I separate the tables in this manner to improve the performance? I'll now go read about the wiki that you mentioned :) – Xavier_Ex Mar 19 '12 at 17:45
  • @Xavier_Ex - I would not recommend it. You get significantly better performance when you can look up all of the data you need in one table (see the denormalization article). Joins are expensive because (1) they require looking up data in multiple places, which can involves seeks on a spinning disk, (2) generally require multiple indexes and some sort of merge, and (3) they make query planning harder, which not only takes time, but also increases the chances that the query optimizer will get something wrong (and badly optimized queries can be *really* slow). – Brendan Long Mar 19 '12 at 17:55
  • Basically, my recommendation is to make your data as "normal" as possible now, then if you find any queries are slow, do whatever you need to do to hit your performance goals (this is the same advice I'd give for any kind of optimization). – Brendan Long Mar 19 '12 at 18:00
  • Hmmmm I agree with your points but not entirely on the first one though, even if I store all the information in one table, seeking data might still require spinning the disk, the data might not be physically in one place when they virtually are. – Xavier_Ex Mar 19 '12 at 19:42
  • 1
    Recently I was faced with this same problem, because MySQL InnoDB tables have a relatively small length limit (~8000 bytes). In my problem table (data from very lengthy insurance forms, more than 100 columns) we have multiple varchar columns, all UTF8. So we easily filled the ~8000 bytes limit and got "error 139 from storage engine" all the time. So we had to split the table. (We tested with the newer Barracuda format and it worked without splitting, but our client's servers still use MySQL 5.0). – MV. Dec 13 '12 at 07:05
  • Basically, I feel like, it all depends on the use cases. Like if the use-case is always to get the profile data separately and back-end-tokens separately. Then one must go for the separate tables, instead of loading the huge data into memory. But if the use case is to get profile data and back-end tokens data and some other data together then one query is better than multiple queries. So it totally depends on the use-cases. – vkrishna17 Jun 13 '18 at 13:14
  • @vkrishna17 I'm saying you should have multiple tables, not that you should do multiple queries. This is what JOIN is for. – Brendan Long Jun 19 '18 at 17:01
18

One big table is often a poor choice. Related tables are what relational database were designed to work with. If you index properly and know how to write performant queries, they are going to perform fine.

When tables get too many columns, then you can run into issues with the actual size of the page that the database is storing the information on. Either the record can end up being too large for the page, in which can you may end up not being able to create or update a specific record which makes users unhappy or you may (in SQL Server at least) be allowed some overflow for particular datatypes (with a set of rules you need to look up if you are doing this) but if many records will overflow the page size you can create tremedous performance problems. Now how MYSQL handles the pages and whether you have a problem when the potential page size gets too large is something you would have to look up in the documentation for that database.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
7

Came across this, and as someone who used to use MySQL a lot, and then switched over to Postgres recently, one of the big advantages is that you can add JSON objects to a field in Postgres.

So if you are in this situation, you don't have to necessarily decide between one large table with many columns and splitting it up, but you can merge columns into JSON objects to reduce it e.g. instead of address being 5 columns, it can just be one. You can also query on that object too.

moinhaque
  • 220
  • 1
  • 4
  • 14
  • good point for storing data that WILL NEVER BE NEEDED to be searched directly in database but you should store them by any reason! – Saghachi Dec 13 '22 at 11:24
  • As I said, you can still search on the JSON objects as well. See here for how this can be done https://popsql.com/learn-sql/postgresql/how-to-query-a-json-column-in-postgresql – moinhaque Dec 15 '22 at 07:41
5

I have a good example. Overly Normalized database with the following set of relationships:

people -> rel_p2staff -> staff

and

people -> rel_p2prosp -> prospects

Where people has names and persons details, staff has just the staff record details, prospects has just prospects details, and the rel tables are relationship tables with foreign keys from people linking to staff and prospects.

This sort of design carries on for entire database.

Now to query this set of relations it's a multi-table join every time, sometimes 8 and more table join. It has been working fine up to mid this year, when it started getting very slow now that we past 40000 records of people.

Indexing and all low hanging fruits had been used up last year, all queries are optimized to perfection. This is the end of the road for the particular normalized design and management now approved a rebuilt of entire application that depends on it as well as restructure of the database, over a term of 6 months. $$$$ Ouch.

The solution will be to have a direct relation for people -> staff and people -> prospect

Vlad
  • 131
  • 2
  • 8
  • Would be interested to know how the rebuild went? Did you end up designing something similar to single table inheritance where you had a `type` being a `staff` or a `prospect`? – Coderama Apr 19 '17 at 00:30
  • 3
    Went with direct relation people -> staff and people -> prospect, works a charm, easy to use, fast to query. – Vlad Apr 20 '17 at 01:22
3

ask yourself these questions if you put everything in one table, will you have multiple rows for that user? If you have to update a user do you want to keep an audit trail? Can the user have more than one instance of a data element? (like phone number for instance) will you have a case where you might want to add an element or set of elements later? if you answer yes then most likely you want to have child tables with foreign key relationships.

Pros of parent/child tables is data integrity, performance via indexes (yes you can do it on a flat table also) and IMO easier to maintain if you need to add a field later, especially if it will be a required field.

Cons design is harder, queries become slightly more complex

But, there are many cases where one big flat table will be appropriate so you have to look at your situation to decide.

Brian
  • 2,229
  • 17
  • 24
  • Thank you for reminding me! So in my case I was only considering the case where every user can't have more than one row so all information fields are one-to-one. Also the user cannot have more than one instance of the same element as I believe in the concept of one element cannot exist in more than one place. For the third question, yes I might add more elements to the table but they will not break the requirements I mentioned above. I think the parent/child table is good when I want to associate multiple rows to one user, but in this case my concern is that a user has many one-to-one columns. – Xavier_Ex Mar 19 '12 at 17:37
  • even if all of the elements are currently one to one, that doesn't obviate the need or desire to have parent/child tables IMO. Keeping a log of changed data is one use. lazy loading objects is another. while there are benefits to a single table structure there are benefits to parent child layouts as well (although I have seen people go to extremes with these as well). – Brian Mar 19 '12 at 18:16
1

I'm already done doing some sort of database design. for me, it depends on the difficulty of the system with database management; yeah it is true to have unique data in one place only but it is really hard to make queries with overly normalized database with lots of record. Just combine the two schema; use one huge table if you feel that you'll be having a massive records that are hard to maintain just like facebook,gmail,etc. and use different table for one set of record for simple system... well this is just my opinion .. i hope it could help.. just do it..you can do it... :)

0

The conventional way of doing this would be to use different tables as in a star schema or snowflake schema. Howeevr, I would base this strategy to be two fold. I believe in the theory that data should only exist in one place, there for the schema I mentioned would work well. However, I also believe that for reporting engines and BI suites, a columnar approach would be hugely beneficial becuase it is more supportive of the the reporting needs. Columnar approaches like those with infobright.org have huge performance gains and compression that makes using both approaches incredibly useful. Alot of companies are starting to realize that have just one database architecture in the organization is not supportive of the full range of their needs. Alot of companies are implementing both the concept of having more than one database achitecture.

Craig Trombly
  • 464
  • 2
  • 9
-6

i think having a single table is more effective but you should make sure that the table is organised in a manner that it shows the relationship,trend as well as the difference in variables of the same row. for example if the table shows age and grades of the students you should arange the table in a manner that thank highest scorer is well differentiated with the lowest scorer and the difference in the age of students is even.