14

When it comes to column order in DB tables, are there any standards or at least best practices?

Here's a handmade convention that I follow:

  • primary key (i.e. id);
  • unique columns (i.e. email, ssn);
  • foreign keys (i.e. article);
  • columns holding user generated data (i.e. first_name, last_name);
  • columns holding system generated data;
    • non-boolean (i.e. password_hash);
    • boolean (i.e. deleted, verified)
  • timestamp columns (i.e. created_at);

These leave many questions unanswered, though, so I'd like to hear your thoughts.

Emanuil Rusev
  • 34,563
  • 55
  • 137
  • 201
  • 2
    order should not be important in any modern database other than for human readable organization –  Jan 21 '10 at 23:15
  • Best practices for column names are more important than order IMHO. I append the suffix "_PK" to my primary keys and "_FK" to my foreign keys; picked up the habit from another database guy. – Scott Jan 22 '10 at 00:27

4 Answers4

8

In short, you've stated the standard conventions well and you're not missing a lot. IMO, the only move that would make someone look unprofessional would be not having the Primary Key(s) first. Having the foreign keys come right after that is a nice convention, but not a big deal. (Multi-field primary keys that include foreign keys should of course be at the very beginining, or someone should be beaten.) I would add two additional thoughts:

  1. Have fields with similar themes near each other. Having City/State/Zip fields widely separated would be unhelpful, for example. I think it would not matter in the slightest whether user_role or user_ip came first, but they sound like they should be next to each other.
  2. Secondary to other such conventions, it doesn't hurt for things to be alphabetical.

Having additional conventions within your database is a very good idea (like as you mention always having the timestamp at the end). If you have ChangeDate and ChangeBy fields in a lot of your tables, having them (obvously next to each other and) consistently located is good.

Additionaly, ErikE mentioned that there can be some efficiency to having, at the end of your table, the variable length fields (varchar, nvarchar) that might often contain nulls. Other than that, I don't think there are any performance advantages to arranging things a certain way in modern relational databases.

Naming

Often when you're deciding column order is the same time you're deciding on column names, so I'd like to address that a little. You can certainly make horribly, costly mistakes with the naming of your fields; this is much more important than your column ordering. Ordering can be changed easily, but poor names will cause you problems forever. It's a huge pain to change table/column names a year later when there's dozen's of references to them. I just added an answer here to address this very important topic.

Community
  • 1
  • 1
Patrick Karcher
  • 22,995
  • 5
  • 52
  • 66
  • I call my tables "somethings" (plural) and my primary key "something" (singular). – Tor Valamo Jan 21 '10 at 23:19
  • Yep, perfect. Customers.CustomerID. I worked with someone who prefixed standard data tables with *tbl*, and had a few other prefixes. Done well, this is okay I think, but not necessary. – Patrick Karcher Jan 22 '10 at 00:30
  • nouI verThink nouPrefixes verAre adjmuch adjWorse conThan nouNot adjNecessary. I think they are sloppy and clutter the database in a harmful way. – ErikE Jan 22 '10 at 18:22
  • I love it how your answer evolves over time. – Emanuil Rusev Jan 22 '10 at 21:42
6

In MSSQL Server, NULL columns at the end of the column list actually reduce the space required to store that row, which can increase the number of rows per page, which can reduce the number of reads required per I/O operation, which is a performance benefit. While the performance benefit may not be huge, it is something to keep in mind for any column that has a preponderance of NULL values.

Proof of trailing NULLs reducing storage space can be had at Deciphering a SQL Server data page:

... The null bitmap is slightly different (fe / 1111 1110) since it's now the second column that's null. What's interesting is that in this row, only a single variable length column is present, not two. Thus there's only a single variable length column end index identifier, 0d00 / 0x000d / 13. From that we can conclude that columns are handled in order, and thus one might want to consider the order of columns, if a specific column is usually null, it might be more efficient to have it ordered last.

Note that this applies only to variable-length columns. While that clearly includes varchar, varbinary, and so on, I'm not sure about other data types (and don't have time right now to conclusively determine this).

ErikE
  • 48,881
  • 23
  • 151
  • 196
1

In MS Sql Server, datatypes ntext, image, and text (all recently deprecated) should be the last columns in the row to avoid a performance penalty.

egrunin
  • 24,650
  • 8
  • 50
  • 93
-1

you can find different best-practices allong the net.

Always save CREATE TABLE statements, along with all other statements defining database schema in a secure location. Every time you make a change to a database object, be sure to script the change and check it into version-control software, such as Visual Source Safe.

With such policy you can easily re-create database schema on the same or different server, if necessary. Also, if you have the same database on multiple servers, it's easy to compare schemas and reconcile any differences that might have crept in over time.

Although descriptive, table names have no performance benefits. They make databases self-documenting and easier to code against. Table names should reflect their business meaning.

Create user tables on a non-primary filegroup; reserve the primary file group for system objects. This way the system supplied and user-defined objects do not compete for disk resources.

Create commonly accessed tables on the same filegroup. You can expect performance benefits if the data of commonly joined tables resides on the same disk.

Create a clustered index on every table. Each table can only have a single clustered index. If a table has a clustered index, its data is physically sorted according to the clustered index key. Clustered indexes in SQL Server have numerous benefits. For example, if you retrieve data from a table using an ORDER BY clause referencing the clustered index key, the data does not need to be sorted at query execution time.

If two tables have a common column, for example customer_id, and both tables have clustered indexes on customer_id column joining, such tables will be considerably more efficient than joining the same tables based on the same column but without clustered indexes.

Ensure the clustered index is built on a column that contains distinct

Source: Creating SQL Server tables: A best practices guide

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
R van Rijn
  • 909
  • 10
  • 16
  • 1
    Any database worth its salt will output the existing schema as text.... at least postgres, mysql, ms-sql, and oracle do it. – Joe Koberg Jan 21 '10 at 22:57
  • down vote for recommending Visual SourceSafe for anything, especially in 2010 –  Jan 21 '10 at 23:16
  • @fuzzy lollipop: what do you recommend instead? I'm looking for something that integrates with SQL Server databases (ideally with a management studio plugin) – ErikE Jan 22 '10 at 02:03
  • @fuzzy lollipp: It was not my intention to recommend Visual SourceSafe. I worked with it and i know its crap. – R van Rijn Jan 22 '10 at 06:16
  • you don't need something that "integrates" with SQL Server. Any version control system will work, just don't use Visual Source Safe, in 15 years it is the only version control system I have actually lost data to numerous times. –  Jan 22 '10 at 16:26
  • I said integrates literally, not in any figurative sense, not "integrates." What I want is something that either updates the contents of stored procedures itself or peeks at submitted batches to catch updates to stored procedures and keep version control in synch. What I don't want is "remember, every time you change a stored procedure you have to ALSO check it into version control!" I'd much prefer "remember, the only place you're allowed to alter stored procedures is in SSMS." I would love something that also managed privileges so that checkin/out of objects grants/revokes alter permission. – ErikE Jan 22 '10 at 18:27