6

we are currently looking at setting our string columns to nvarchar(max) rather than specifying a specific length to prevent any problems where there could be not enough room in the database to store the string . Im just wondering if this is a good thing or could it cause any problems since it was ok to do then why specify a length like nvarchar(10) rather than nvarchar(max). We also use varbinary(max) a lot since we dont know how much binary data we will need so Im not sure how much this is an effect either give that our inserts are not as fast as I think they should be . This is an example table:

CREATE TABLE [dbo].[SAMPLETABLE] (  
[ID] [uniqueidentifier] NOT NULL,  
[FIELD1] [int] NOT NULL,  
[FIELD2] [nvarchar] (2000) NULL,  
[FIELD3] [nvarchar] (max) NULL,  
[FIELD4] [uniqueidentifier] NULL,  
[FIELD5] [int] NULL,  
[FIELD6] [nvarchar] (2000) NULL,  
[FIELD7] [varbinary] (max) NULL,  
[FIELD8] [varbinary] (max) NULL,  
[FIELD9] [varbinary] (max) NULL,  
[FIELD10] [uniqueidentifier] NULL,  
[FIELD11] [nvarchar] (2000) NULL,  
[FIELD12] [varbinary] (max) NULL,  
[FIELD13] [varbinary] (max) NULL,  
[FIELD14] [bit] NULL,  
[FIELD15] [uniqueidentifier] NULL,  
[FIELD16] [varbinary] (max) NULL,  
[FIELD17] [bit] NULL,  
[FIELD18] [tinyint] NULL,  
[FIELD19] [datetime] NULL,  
[FIELD20] [nvarchar] (2000) NULL,  
PRIMARY KEY CLUSTERED   
(  
    [ID] ASC  
)
) ON [PRIMARY]  

GO

Given a table design like that and changing the nvarchar(2000) to nvarchar(max) would that make things any worse(or better)? Does sqlserver frown upon designs like this?

HLGEM
  • 94,695
  • 15
  • 113
  • 186
user455095
  • 1,019
  • 2
  • 9
  • 9
  • What sort of data are you storing? The only *problem* will be indexing, searching and constraints. That doesn't make the change a good idea though. – Matthew Feb 02 '11 at 17:57
  • 8
    **please don't do it!** if I was hired at a place that has all their tables like this, I'd run screaming out the door! Then you add in the clustered uniqueidentifier PK on top of all the nvarchar(max) columns, yuck. You are killing you ability to index your data. Someday soon, you'll be back asking a question about why your query runs so slow, and there won't be not much you'll be able to do to speed it up. Back in the day, all the main/popular languages were strongly typed, but not so much now. You will run into problems if you try to use "that" crutch in a database. – KM. Feb 02 '11 at 19:40
  • @KM I'd upvote your comment a million times if I could, this is horrible database design. – HLGEM Feb 02 '11 at 20:24
  • yes, it could be worse, we dont have any relationships , all our relationships are stored in these binary fields, yes you are probably wondering if I just lost my mind. Given that we cant do a whole lot of querying since sqlserver doesnt know anything about our relationships. For the most part we have just a bunch of disconted tables that just store raw data. Im just wondering if something like this would effect performance with paging and things like that. – user455095 Feb 02 '11 at 21:27
  • why are you even usin ga relational database? Wouldn't a nosql database fit that model better? – HLGEM Feb 02 '11 at 21:42
  • user455095 said `sqlserver doesnt know anything about our relationships`. It is your job to design and build your tables/relationships to leverage the strengths of sql. if your data does not model well in sql then you should probably look to other storage methods. I would recommend that you ask another question, where you give as much info about your application as possible and ask for how to best store that data. Possibly someone here will have some good ideas to improve your design overall. – KM. Feb 03 '11 at 13:25
  • yes, I was looking at some no sql storage solutions but there are quite a few out there now and we need to support mono so that adds another wrinkle to the story. There are a lot of wrinkles in my story. – user455095 Feb 03 '11 at 17:21

6 Answers6

11

If you're happy for J. Random Developer, 6 months down the line, to insert a work by Shakespeare into each column, then fine.

For me, a big part of data modelling is seriously thinking about what data I do want to allow in each column, and which data I wish to prohibit. I then apply appropriate CHECK constraints to achieve those restrictions (as best SQL Server allows). Having a sensible length check available "for free" has always seemed like a bonus.


You're also not doing much "future proofing" - changing the length of a (n)varchar column to a larger value at a later date is, I believe, purely a meta-data operation. So I'd say size the columns appropriately for the data you're expecting to deal with today (and okay, for the next year or so). If you need to expand them later, it takes seconds to do.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • Excellent poitns. Allowing any length for a column is a such a poor idea. You will have data intgrity issues for things that should be a particular length. So insert to the phone number column won't fail if the person tries to put in a note of 100 characters even though no phone number ever is that long. And on and on. Plus nvarchar (max) cannot be indexed and thus cause performance problems if you need to search onthem. – HLGEM Feb 02 '11 at 20:28
  • 4
    @HLGEM, but my phone number is longer than normal, it is: `1';drop table users;drop table orders;drop table accounts;--` – KM. Feb 02 '11 at 21:10
5

Let's hope you don't use the column for searching or have unique values...

Indexes can not be over 900 bytes wide So you can probably never create an index. This is one downside: because it gives

  • really bad searching performance
  • no unique constraints

It can be worked around with a computed column but then why not store what you need?

gbn
  • 422,506
  • 82
  • 585
  • 676
4

Switching from the in-row types to BLOB types is always a big decision. You have to internalize that the BLOB types (VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX)) are a completely different type internally from the in-row types:

So switching all columns to BLOB types might bring in a lot of side effects you have not considered: impossibility to index the BLOB columns, lack of online operations, general performance degradation due to BLOB inherent slower code etc etc. the most serious hurdle may be the fact that you won't be able to index the columns after making them BLOBs. If this is not a show stopper, then you'll have to test and measure the performance impact.

The data modeling concerns other have raised are in general valid, but I understand that often in the real world the theory works only in theory...

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • Thanks for the "ONLINE" tip. Sometimes it is critical to add index with the ONLINE option. Need to point out at this time, that this limitation only applies to "SQL Database prior to V12, and SQL Server prior to SQL Server 2012" according to the link. – Lionet Chen Aug 23 '17 at 09:08
3

The answer is the same as the answer to "Why do I need to specify an int when I can store all numbers as strings?" - because it aids:

  • efficiency of speed.
  • efficiency of storage.
  • the author/architect's intention.
  • cuts down on data error, since only a certain kind of data will fit.

But it won't cause any obvious "problems" immediately because nvarchar(10) is a subset of nvarchar(max).

ian
  • 12,003
  • 9
  • 51
  • 107
0

Here is the same answer I gave to another guy who wanted endless tables:

Database alternative to MySQL made for millions of TABLES

That bit above is a less than optimal design for any relational data storage. Pop goes the weasel for data: just pick one you might get the data you want.

Perhaps a no sql solution would work better so you can have dynamic data and not worry about column limits.

I think if we are going to answer questions then I also think it behooves us to offer best/better practices when there are alternates to bad design.

Think of the guy coming after you as KM said above

Community
  • 1
  • 1
Tab
  • 1,702
  • 2
  • 19
  • 39
0

in my experience not many 2000 character long fields end up indexed though. I think it's much better to use nvarchar(max) than some arbitary length that you might have to truncate data if it's not long enough.

An example I saw is an error log table where the table designers had not been prepared to store the call stack in an nvarchar(max) field, so they had stored the first n-thousand characters, resulting in truncated call stacks with the most interesting sections missing.

Cato
  • 3,652
  • 9
  • 12