99

I've read up on this on MSDN forums and here and I'm still not clear. I think this is correct: Varchar(max) will be stored as a text datatype, so that has drawbacks. So lets say your field will reliably be under 8000 characters. Like a BusinessName field in my database table. In reality, a business name will probably always be under (pulling a number outta my hat) 500 characters. It seems like plenty of varchar fields that I run across fall well under the 8k character count.

So should I make that field a varchar(500) instead of varchar(8000)? From what I understand of SQL there's no difference between those two. So, to make life easy, I'd want to define all my varchar fields as varchar(8000). Does that have any drawbacks?

Related: Size of varchar columns (I didn't feel like this one answered my question).

Community
  • 1
  • 1
jcollum
  • 43,623
  • 55
  • 191
  • 321
  • 6
    Imagine trying to fit a business name 500 characters long on a business card... :) – OMG Ponies Jan 05 '10 at 22:45
  • 2
    @OMG Ponies: every time I see your username I chuckle. Now, what were you saying? (Just kidding) – jcollum Jan 05 '10 at 22:48
  • 4
    @jcollum: SpaceMan Spiff will always get my vote. That's not true - *any* Calvin & Hobbes will do, but especially the snow sculpting ones. Or the tyranosaurus flying an F-14. But I digress... – OMG Ponies Jan 05 '10 at 22:59

5 Answers5

132

One example where this can make a difference is that it can prevent a performance optimization that avoids adding row versioning information to tables with after triggers.

This is covered by Paul White here

The actual size of the data stored is immaterial – it is the potential size that matters.

Similarly if using memory optimised tables since 2016 it has been possible to use LOB columns or combinations of column widths that could potentially exceed the inrow limit but with a penalty.

(Max) columns are always stored off-row. For other columns, if the data row size in the table definition can exceed 8,060 bytes, SQL Server pushes largest variable-length column(s) off-row. Again, it does not depend on amount of the data you store there.

This can have a large negative effect on memory consumption and performance

Another case where over declaring column widths can make a big difference is if the table will ever be processed using SSIS. The memory allocated for variable length (non BLOB) columns is fixed for each row in an execution tree and is per the columns' declared maximum length which can lead to inefficient usage of memory buffers (example). Whilst the SSIS package developer can declare a smaller column size than the source this analysis is best done up front and enforced there.

Back in the SQL Server engine itself a similar case is that when calculating the memory grant to allocate for SORT operations SQL Server assumes that varchar(x) columns will on average consume x/2 bytes.

If most of your varchar columns are fuller than that this can lead to the sort operations spilling to tempdb.

In your case if your varchar columns are declared as 8000 bytes but actually have contents much less than that your query will be allocated memory that it doesn't require which is obviously inefficient and can lead to waits for memory grants.

This is covered in Part 2 of SQL Workshops Webcast 1 downloadable from here or see below.

use tempdb;

CREATE TABLE T(
id INT IDENTITY(1,1) PRIMARY KEY,
number int,
name8000 VARCHAR(8000),
name500 VARCHAR(500))

INSERT INTO  T 
(number,name8000,name500)
SELECT number, name, name /*<--Same contents in both cols*/
FROM master..spt_values

SELECT id,name500
FROM T
ORDER BY number

Screenshot

SELECT id,name8000
FROM T
ORDER BY number

Screenshot

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • 1
    so, if almost all my values are 3 or 4 chars, cannot exceed 4 chars ever, and I want to avoid "sort operations spilling to tempdb", I will declare my column VARCHAR(8) and use a CHECK constraint to enforce that column width cannot exceed 4 chars. What do you think? – A-K Jan 26 '12 at 04:13
  • 12
    @AlexKuznetsov - For that situation I would declare them as `char(4)` as there is 2 bytes overhead per variable column anyway. – Martin Smith Jan 26 '12 at 11:55
  • 1
    valid link for SQL Kiwi post https://www.sql.kiwi/2012/08/deletes-that-split-pages-and-forwarded-ghosts.html – Charles Oct 12 '21 at 14:41
25

From a processing standpoint, it will not make a difference to use varchar(8000) vs varchar(500). It's more of a "good practice" kind of thing to define a maximum length that a field should hold and make your varchar that length. It's something that can be used to assist with data validation. For instance, making a state abbreviation be 2 characters or a postal/zip code as 5 or 9 characters. This used to be a more important distinction for when your data interacted with other systems or user interfaces where field length was critical (e.g. a mainframe flat file dataset), but nowadays I think it's more habit than anything else.

BBlake
  • 2,388
  • 1
  • 22
  • 31
  • 3
    Makes sense... for things that naturally have a maximum length. But what do you do when the max length isn't obvious? E.g. a business name. – jcollum Jan 06 '10 at 00:11
  • 2
    For something like that, if I don't foresee any way to forecast what the size could potentially be, then I usually will go with a varchar(8000) or varchar(max), depending on the type of data – BBlake Jan 06 '10 at 15:15
  • 4
    It seems this does make a difference in performance, even in 2017: http://dba.stackexchange.com/a/162117/1822 –  Jan 24 '17 at 22:04
  • 2
    More recent answers show that there *are* costs: it affects optimization logic [Martin Smith's answer](http://stackoverflow.com/a/5654947/199364) and also consider 8K total row size issues mentioned by [gbn](http://stackoverflow.com/a/2009789/199364) and [Oliver](http://stackoverflow.com/a/20700888/199364). – ToolmakerSteve Jan 26 '17 at 19:08
14

There are some disadvantages to large columns that are a bit less obvious and might catch you a little later:

  • All columns you use in an INDEX - must not exceed 900 bytes
  • All the columns in an ORDER BY clause may not exceed 8060 bytes. This is a bit difficult to grasp since this only applies to some columns. See SQL 2008 R2 Row size limit exceeded for details)
  • If the total row size exceeds 8060 bytes, you get a "page spill" for that row. This might affect performance (A page is an allocation unit in SQLServer and is fixed at 8000 bytes+some overhead. Exceeding this will not be severe, but it's noticable and you should try to avoid it if you easily can)
  • Many other internal datastructures, buffers and last-not-least your own varaibles and table-variables all need to mirror these sizes. With excessive sizes, excessive memory allocation can affect performance

As a general rule, try to be conservative with the column width. If it becomes a problem, you can easily expand it to fit the needs. If you notice memory issues later, shrinking a wide column later may become impossible without losing data and you won't know where to begin.

In your example of the business names, think about where you get to display them. Is there really space for 500 characters?? If not, there is little point in storing them as such. http://en.wikipedia.org/wiki/List_of_companies_of_the_United_States lists some company names and the max is about 50 characters. So I'd use 100 for the column max. Maybe more like 80.

Community
  • 1
  • 1
Oliver
  • 3,225
  • 1
  • 18
  • 12
10

Apart from best practices (BBlake's answer)

  • You get warnings about maximum row size (8060) bytes and index width (900 bytes) with DDL
  • DML will die if you exceed these limits
  • ANSI PADDING ON is the default so you could end up storing a wholeload of whitespace
gbn
  • 422,506
  • 82
  • 585
  • 676
  • 41
    Just to clarify about ANSI PADDING ON: when using `nvarchar` and `varchar` types, this only means that trailing spaces are preserved upon insert--not that the values are padded with spaces to the size of the column, as in `char` and `nchar`. – Ben M Jan 05 '10 at 23:15
2

Ideally you'd want to go smaller than that, down to a reasonably sized length (500 isn't reasonably sized) and make sure the client validation catches when the data is going to be too large and send a useful error.

While the varchar isn't actually going to reserve space in the database for the unused space, I recall versions of SQL Server having a snit about database rows being wider than some number of bytes (do not recall the exact count) and actually throwing out whatever data didn't fit. A certain number of those bytes were reserved for things internal to SQL Server.

Otis
  • 992
  • 3
  • 12
  • 22
  • true, this used to be a lot bigger concern as well. But nowadays, space is really cheap so I don't think it's that big a concern for consideration, at least from my point of view. – BBlake Jan 05 '10 at 23:01
  • "(500 isn't reasonably sized)" for what? A name? A paragraph? A blog post? It's all very relative unless there are obvious limits, like a ZIP code or SSN. – jcollum Jan 06 '10 at 00:20
  • 1
    @jcollum: In your example, 500 doesn't seem reasonably sized for a business name. – Otis Jan 06 '10 at 03:34
  • 1
    @BBlake: Regardless of the cost of storage, if SQL Server still has row size constraints than it doesn't matter how much storage you have. You could store everything in textblobs but there are some SQL operations you can't do on a blob that you can do on a varchar. – Otis Jan 06 '10 at 03:37
  • 2
    @Otis: my point is this: there's no actual constraint on the size of a business name. Unless there's a law somewhere. So in that case I'd make that field varchar(8000) and call it a day. My thinking goes like this: Real constraint? varchar(x). No real constraint? varchar(8000). – jcollum Jan 11 '10 at 15:35
  • 25
    I thought 30 or so chars was good for city names, until I saw El Pueblo de Nuestra Señora la Reina de los Ángeles del Río de Porciúncula – StuartLC Sep 22 '11 at 11:18