10

I have read that the varchar fields should have placed as a column at the end of a database table - at least in MySQL. The reason is because the varchar fields have variable length and it could possibly slow down the queries. My question: is this applies to MSSQL 2012 or not? Should I design my tables to have every textual data at the end of every database row or not?

Zsolt
  • 3,263
  • 3
  • 33
  • 48
  • I'm not convinced this is even true for MySQL. Yes, it's variable width - but what difference does it make where in the row it is? – Ariel Sep 04 '12 at 21:57
  • 2
    Possible duplicate of [Is there any reason to worry about the column order in a table?](http://stackoverflow.com/questions/894522/is-there-any-reason-to-worry-about-the-column-order-in-a-table) See also [@Quassnoi's blog article](http://explainextended.com/2009/05/21/choosing-column-order/) on this subject. – eggyal Sep 04 '12 at 21:59
  • the thinking is that the row may be updated at some point, causing a 'chained row' where the data is somewhere else on disk, causing more disk io... probably minor, but maybe makes some small difference sometimes – Randy Sep 04 '12 at 22:00
  • @Randy I don't believe MySQL has such a thing as chained rows. Instead the old row is invalidated and the updated one written at the end (or in a free spot). myisam and innodb do it differently, but I don't believe either has chained rows. However the issue in the linked answer of having to seek past the variable length columns is interesting. – Ariel Sep 04 '12 at 22:23

2 Answers2

9

The order of columns in a table will have a very small impact on performance, as compared to the performance impact of your database design (entities, attributes and relationships), your transaction design and your query design.

To tell if the difference is non-negligible, you'd really need to setup some tests, and compare the results.

Typically, I put the primary key as the first column, then the foreign key(s), and then natural keys and frequently accessed columns. I typically put the longer strings towards the end of the row. But this isn't necessarily a performance optimization, as much as it is a style preference which I use for convenience.

The order of columns can have an impact on the size of the row in SQL Server, when a large number of columns in a row are nullable and most of those columns contain NULL. SQL Server (like Oracle) has optimization where no space is reserved for columns that contain NULL values AT THE END of the row. Some space is reserved for every column in the row, up to the last non-NULL value in the row.

The takeaway from that is that if you have a lot of nullable columns, you want the columns that are most frequently not NULL BEFORE the columns that are most frequently NULL.

NOTE: Keep in mind that SQL Server orders the columns within a table first by whether the column is fixed length or variable length. All of the fixed length columns are stored first, then followed by all of the variable length columns. Within those sets of columns (fixed and variable), the columns are stored in the order they are defined.

spencer7593
  • 106,611
  • 15
  • 112
  • 140
3

When it comes to creating an index, column order does matter.

An index key is sorted on the first column of the index and then subsorted on the next column within each value of the previous column. The first column in a compound index is frequently referred to as the leading edge of the index. For example, consider this table:

c1  c2
1   1
2   1
3   1
1   2
2   2
3   2

If a composite index is created on the columns (c1, c2), then the index will be ordered as shown in this table:

c1  c2
1   1
1   2
2   1
2   2
3   1
3   2

As shown in the above table, the data is sorted on the first column (c1) in the composite index. Within each value of the first column, the data is further sorted on the second column (c2).

Therefore, the column order in a composite index is an important factor in the effectiveness of the index. You can see this by considering the following:

  • Column uniqueness
  • Column width
  • Column data type

SELECT * FROM t1 WHERE c2 = 12

SELECT * FROM t1 WHERE c2 = 12 AND c1 = 11

An index on (c2, c1) will benefit both the queries. But an index on (c1, c2) will not be appropriate, because it will sort the data initially on c1, whereas the first SELECT statement needs the data to be sorted on c2.

Source: SQL Server 2008 Query Performance Tuning Distilled

Kermit
  • 33,827
  • 13
  • 85
  • 121
  • Did you mean: "index on (c2, c1) will benefit both the queries" instead of "index on (c1, c2) will benefit both the queries" ? – Zsolt Sep 07 '12 at 03:36
  • @Zsolt Indeed I did! Thanks for catching that. I've edited my answer. – Kermit Sep 07 '12 at 14:33