14

As part of a bulk load of data from an external source the stageing table is defined with varchar(max) columns. The idea being that each column will be able to hold whatever it finds in the source CSV file, and that we'll validate the data (for type, size, percision etc) later.

But I'm concerned that the varchar(max) column has a lot of overhead for columns that have less than 200 characters. The fellow that designed this assures me this is best practice for ETL but I thought I would validate that assertion with the community.

Ralph Shillington
  • 20,718
  • 23
  • 91
  • 154

5 Answers5

18

VARCHAR(MAX) column values will be stored IN the table row, space permitting. So if you have a single VARCHAR(MAX) field and it's 200, 300 byte, chances are it'll be stored inline with the rest of your data. No problem or additional overhead here.

Only when the entire data of a single row cannot fit on a single SQL Server page (8K) anymore, only then will SQL Server move VARCHAR(MAX) data into overflow pages.

So all in all, I think you get the best of both worlds - inline storage when possible, overflow storage when necessary.

Marc

PS: As Mitch points out, this default behaviour can be turned off - I don't see any compelling reasons to do so, however....

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • I have a varchar(max) column which is always empty. It is only there for legacy reasons (and makes my older code in ASP.net work without change). So if I get you right, my database would not become much smaller (if smaller at all) if I remove it. Right? – Tillito Jun 03 '13 at 14:08
  • According to Jeff Hall's answer it depends of the number of rows. Assuming 1 Mio. rows I would save 2 MB. Correct? – Tillito Jun 03 '13 at 14:11
6

The storage overhead is the same between varchar(n) and varchar(max) The storage size is the actual length of data entered + 2 bytes

MSDN Reference

Check out these similar SO questions:

https://stackoverflow.com/questions/166371/varcharmax-versus-varcharn-in-ms-sql-server Are there any disadvantages to always using nvarchar(MAX)?

Community
  • 1
  • 1
Jeff Hall
  • 1,694
  • 11
  • 7
3

As far as I know, the overhead that you are probably thinking about (storing the data out-of-row in the same way a TEXT or BINARY value is stored in sql server) only applies if the data size exceeds 8000 bytes. So there shouldn't be a problem using this with smaller columns for ETL processes.

Steve Willcock
  • 26,111
  • 4
  • 43
  • 40
  • 2
    depends on the setting of the table option "large values types out of row", If set to "ON" it will use 16 byte pointers to store the data outside the table. – Mitch Wheat May 13 '09 at 14:36
  • 2
    Mitch: what would ever make you switch that option on? Seems to me like you'd loose all the benefits of the MAX types, no? – marc_s May 13 '09 at 14:54
0

If you use an varchar(max) or varbinary(max) in MSSQL2005 SSIS is creating a temporary file for each column in your record this can drop you performance and become a big problem. MS claims that they solved this issue in MSSQL2008.

JSC
  • 3,705
  • 3
  • 26
  • 25
  • 2
    This sounds vry strange --- you'll have to back that up with some references --- for example a link to the Microsoft "claim" – Ralph Shillington May 29 '09 at 13:09
  • I've seen it with my own eyes. We have an SSIS server with 16 cores and 64 gb of RAM and when we create over somewhat 8 million records with varchar(max)/varbinary(max) it will take more then 12 hours to create them all. But if we use varchar(8000)/varchar(8000) then it is ready in several minutes. I will try to find the claim, I've seen it somewhere on the net. – JSC Jun 04 '09 at 20:01
  • This is not the exact thing but MS says that they improved the allocation of creating temp-files. http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=346675 – JSC Jun 04 '09 at 20:06
-1

Well I want to say that there shouldn't be that big an overhead because i don't think that sql automatically assigned an alloted amount of data for nvarchar, and instead only allots what is needed for what is inserted, but i don't have naything to prove or back up that idea.

DForck42
  • 19,789
  • 13
  • 59
  • 84