When is it a good idea to move columns off a main table into an auxiliary table?

Question

Say I have a table like this:

  create table users (
   user_id int not null auto_increment,
   username varchar,
   joined_at datetime,
   bio text,
   favorite_color varchar,
   favorite_band varchar
   ....
 );

Say that over time, more and more columns -- like favorite_animal, favorite_city, etc. -- get added to this table. Eventually, there are like 20 or more columns.

At this point, I'm feeling like I want to move columns to a separate user_profiles table is so I can do select * from users without returning a large number of usually irrelevant columns (like favorite_color). And when I do need to query by favorite_color, I can just do something like this:

select * from users inner join user_profiles using user_id where
user_profiles.favorite_color = 'red';

Is moving columns off the main table into an "auxiliary" table a good idea?

Or is it better to keep all the columns in the users table, and always be explicit about the columns I want to return? E.g.

select user_id, username, last_logged_in_at, etc. etc. from users;

What performance considerations are involved here?

On the `select *` read: [Why is SELECT * considered harmful?](http://stackoverflow.com/questions/3639861/why-is-select-considered-harmful) — Ocaso Protal, Apr 04 '11 at 13:31

score 6 · Accepted Answer · answered Apr 04 '11 at 13:21

Don't use an auxiliary table if it's going to contain a collection of miscellaneous fields with no conceptual cohesion.

Do use a separate table if you can come up with a good conceptual grouping of a number of fields e.g. an Address table.

Of course, your application has its own performance and normalisation needs, and you should only apply this advice with proper respect to your own situation.

score 2 · Answer 2 · answered Apr 04 '11 at 13:29

I would say that the best option is to have properly normalized tables, and also to only ask for the columns you need.

A user profile table might not be a bad idea, if it is structured well to provide data integrity and simple enhancement/modification later. Only you can truly know your requirements.

score 2 · Answer 3 · answered Apr 04 '11 at 13:42

One thing that no one else has mentioned is that it is often a good idea to have an auxiliary table if the row size of the main table would get too large. Read about the row size limits of your specific databases in the documentation. There are often performance benefits to having tables that are less wide and moving the fields you don't use as often off to a separate table. If you choose to create an auxiliarary table with a one-to-one relationship make sure to set up the PK/FK relationship to maintain data integrity and set a unique index or constraint on the FK field to mainatin the one-to-one relationship.

And to go along with everyone else, I cannot stress too strongly how bad it is to ever use select * in production queries. You save a few seconds of development time and create a performance problem as well as make the application less maintainable (yes less - as you should not willy nilly return things you may not want to show on the application but you need in the database. You will break insert statements that use selects and show users things you don't want them to see when you use select *.).

score 1 · Answer 4 · answered Apr 04 '11 at 13:27

Try not to get in the habit of using SELECT * FROM ... If your application becomes large, and you query the users table for different things in different parts of your application, then when you do add favorite_animal you are more likely to break some spot that uses SELECT *. Or at the least, that place is now getting unused fields that slows it down.

Select the data you need specifically. It self-documents to the next person exactly what you're trying to do with that code.

Kuberchaun · Answer 5 · 2011-04-04T13:41:41.820

1

Don't de-normalize unless you have good reason to.

Adding a favorite column ever other day every time a user has a new favorite is a maintenance headache at best. I would highly consider creating a table to hold a favorites value in your case. I'm pretty sure I wouldn't just keep adding a new column all the time.

edited Apr 04 '11 at 13:41

answered Apr 04 '11 at 13:31

Kuberchaun

29,160
7
51
59

+1 for "Don't de-normalize unless you have good reason to." I'd guess performance or space usage are highly unlikely to be 'good reasons' in a 'users' table – Apr 04 '11 at 15:24
Pretty much the way I was taught and they way I now live by is that if your in an OLTP system the chances of actually having a good reason are so rare and in-frequent that when they happen you could take the group out for lunch to celebrate. – Kuberchaun Apr 04 '11 at 15:26

score 1 · Answer 6 · answered Apr 04 '11 at 13:46

The general guideline that applies to this (called normalization) is that tables are grouped by distinct entities/objects/concepts and that each column(field) in that table should describe some aspect of that entity

In your example, it seems that favorite_color describes (or belongs to) the user. Some times it is a good idea to moved data to a second table: when it becomes clear that that data actually describes a second entity. For example: You start your database collecting user_id, name, email, and zip_code. Then at some point in time, the ceo decides he would also like to collect the street_address. At this point a new entity has been formed, and you could conceptually view your data as two tables:

user: userid, name, email
address: steetaddress, city, state, zip, userid(as a foreign key)

So, to sum it up: the real challenge is to decide what data describes the main entity of the table, and what, if any, other entity exists.

Here is a great example of normalization that helped me understand it better

score 0 · Answer 7 · answered Apr 04 '11 at 13:22

0

When there is no other reason (e.g. there are normal forms for databases) you should not do it. You dont save any space, as the data must still stored, instead you waste more as you need another index to access them.

answered Apr 04 '11 at 13:22

flolo

15,148
4
32
57

This answer is (not intentionally) misleading. While it is true that no space is saved, performance can be dramatically increased. There are a number of possible benefits. An auxiliary table might allow the main table to have a fixed row size, which makes it very fast. Further information at http://dev.mysql.com/doc/refman/5.1/en/static-format.html – David Snabel-Caunt Apr 04 '11 at 13:32

score 0 · Answer 8 · answered Apr 04 '11 at 13:34

It is always better (though may require more maintenance if schemas change) to fetch only the columns you need.

This will result in lower memory usage by both MySQL and your client application, and reduced query times as the amount of data transferred is reduced. You'll see a benefit whether this is over a network or not.

score 0 · Answer 9 · answered Apr 04 '11 at 13:34

0

Here's a rule of thumb: if adding a column to an existing table would require making it nullable (after data has been migrated etc) then instead create a new table with all NOT NULL columns (with a foreign key reference to the original table, of course).

You should not rely on using SELECT * for a variety of reasons (google it).

answered Apr 04 '11 at 13:34

onedaywhen

55,269
12
100
138

@onedaywhen does anyone really do this? what is so wrong with having `nullable` columns – Apr 04 '11 at 15:25
@JackPDouglas: "does anyone really do this?" -- sure they do. Here (http://stackoverflow.com/questions/4431182/creating-a-db-table-null-best-practices/4431554#4431554) is an answer on SO that puts it neatly. Presumably you have already seen it because you too posted an answer to that question :) – onedaywhen Apr 11 '11 at 10:56
@JackPDouglas: "what is so wrong with having nullable columns" -- there's a lot of material out there on the subject e.g. Chris Date (http://www.dbdebunk.com/page/page/1706814.htm). In practise, consider that about once a week I see a question here on SO where a newbie is confused why `some_value = NULL` does not evaluate to TRUE and *every* time a SO user with very high rep in SQL will post misstatements in response, which kind of proves that the `NULL` value is confusing and leads to errors even for experienced users. – onedaywhen Apr 11 '11 at 11:02
Nulls are confusing, but when used sensibly are also very useful - which I have [said before](http://stackoverflow.com/questions/4431182/creating-a-db-table-null-best-practices/4432000#4432000) as you point out. Creating additional tables to avoid having nulls at all is overkill - the cost is too high and the gain too small. Perhaps high rep in SQL on SO is not such a great indication of competence – Apr 11 '11 at 12:23
@JackPDouglas: did you read the Chris Date article? He was responsible for 6NF, please take up your dissatisfaction with him :) Too many experienced SQL coders make errors with NULL to put it down to lack of "competence", IMO. I don't think anyone should need to deal with 3VL. – onedaywhen Apr 12 '11 at 07:12
@JackPDouglas: Here you go (http://stackoverflow.com/questions/5658457/not-equal-operator-in-t-sql-on-null/5658480#5658480), another one in today. 21.7k rep and still thinks that "NULL = NULL is false" [sic]. – onedaywhen Apr 14 '11 at 07:35
I get no rows from `select 'A' where null=null;` on postgres. I guess sometimes `null=null` is false – Apr 14 '11 at 19:20
@JackPDouglas: that's not even a valid query because lacks a `FROM` clause, but even if it was your proof is faulty. You merely would have shown that the evaluation of `NULL = NULL` causes the rows to be removed from the resultset. – onedaywhen Apr 19 '11 at 14:16
@JackPDouglas: Now, consider this: `CREATE TABLE T (c INTEGER NOT NULL, CHECK(NULL = NULL));` and this `INSERT INTO T (c) VALUES (1);` doesn't cause the `CHECK` to bite, have I proved that `NULL = NULL` is TRUE? Of course not. I've merely shown that `NULL = NULL` has not caused the `CHECK` to bite. Now read the spec: "A table check constraint is satisfied if and only if the specified search condition is not false for any row of a table." Therefore, I have proved that `NULL = NULL` is not false. – onedaywhen Apr 19 '11 at 14:20
er - you have merely shown that `null=null` is not *always* treated as false. `select 'A' where null=null;` proves `null=null` is *sometimes* treated as false (ie "causes the rows to be removed from the resultset" - that's what 'false' means in this context). It is a valid query on postgres. As far as I know, `null=null` always evaluates to `null` so what we show for one is true for the other. – Apr 20 '11 at 07:51
@JackPDouglas: "As far as I know, `null=null` always evaluates to `null`" -- then what you think you know is in fact incorrect. `NULL = NULL` evaluates to UNKNOWN which is a *logical* value (NULL is a *data* value). – onedaywhen Apr 20 '11 at 08:16
@JackPDouglas: "`null=null` is sometimes treated as false" -- now you are very wide of the mark. It just so happens my proof is sound but you can choose not to believe it if you like, your loss :) But you have kind of proved my point: most SQL coders, including those who advocate the deliberate use of `NULL`, do not understand the fundamentals of SQL's three value logic. – onedaywhen Apr 20 '11 at 08:20
`select (null=null) is null;` on postgres. Result is 'true'. Never heard of 'unknown' on a real database - name your RDBMS – Apr 20 '11 at 09:30
@JackPDouglas: Again, your's not valid Standard SQL syntax but have you tried `SELECT (NULL=NULL) IS UNKNOWN;` on postgre ;) – onedaywhen Apr 20 '11 at 10:55
'valid' SQL syntax doesn't mean much to me - I use Oracle and Postgres to earn a living, I only care what is 'valid' on them, not in some standard somewhere. From the postgres docs: [Notice that IS UNKNOWN and IS NOT UNKNOWN are effectively the same as IS NULL and IS NOT NULL, respectively, except that the input expression must be of Boolean type](http://www.postgresql.org/docs/current/static/functions-comparison.html). – Apr 21 '11 at 16:42
I think the PostgreSQL docs state, albeit in a subtle way, that `NULL` are `UNKNOWN` are not the same ('effectively' the same but not *actually* the same): [A null input is treated as the logical value 'unknown'](http://www.postgresql.org/docs/current/static/datatype-boolean.html), ['unknown' is represented by the SQL null value](http://www.postgresql.org/docs/current/static/functions-comparison.html). But then are the docs well written? e.g. "The boolean type can have one of only two states: "true" or "false". A third state 'unknown'..." -- so is it two states or three?! – onedaywhen Apr 27 '11 at 08:36
@JackPDouglas: I don't have PostgreSQL installed but I infer from the docs that `SELECT (NULL=NULL) IS UNKNOWN;` will return `TRUE` (point to me :), that `SELECT (NULL=NULL) IS NULL;` will return `TRUE` (point to you but remember that `NULL` is being used to "represent" `UNKNOWN` :) and that `SELECT (NULL=NULL) = FALSE` returns `FALSE` (so do you want to retract your earlier assertion, "on postgres. I guess sometimes null=null is false" ? :) – onedaywhen Apr 27 '11 at 08:43
postgres docs are [renowned for being excellent](http://stackoverflow.com/questions/2248200/whats-a-good-book-for-learning-postgres) - but I'd call that statement an oxymoron. I've filed a documentation bug and maybe it'll get changed. – Apr 27 '11 at 08:48
`select (null=null) = false` returns `null` - from which bit of the docs did you infer it would return `false`? – Apr 27 '11 at 08:51
saying "on postgres, sometimes null=null is false" is not the same as saying "on postgres, null=null is false". If I had said that I would retract it but, on postgres, sometimes `(null=null)`, or indeed `null`, is treated as `false`: `select 'A' where null=null;` or `select 'A' where null;` – Apr 27 '11 at 08:55
[link](http://archives.postgresql.org/pgsql-docs/2011-04/msg00101.php) to bug report thread for your interest – Apr 29 '11 at 07:34
@JackPDouglas: thanks for that. Shame though:both your rewrites are excellent. – onedaywhen May 03 '11 at 08:09

When is it a good idea to move columns off a main table into an auxiliary table?

9 Answers9