7

Why is it that every RDBMS insists that you tell it what the max length of a text field is going to be... why can't it just infer this information form the data that's put into the database?

I've mostly worked with MS SQL Server, but every other database I know also demands that you set these arbitrary limits on your data schema. The reality is that this is not particulay helpful or friendly to work with becuase the business requirements change all the time and almost every day some end-user is trying to put a lot of text into that column.

Does any one with some inner working knowledge of a RDBMS know why we just don't infer the limits from the data that's put into the storage? I'm not talking about guessing the type information, but guessing the limits of a particular text column.

I mean, there's a reason why I don't use nvarchar(max) on every text column in the database.

Lance Roberts
  • 22,383
  • 32
  • 112
  • 130
John Leidegren
  • 59,920
  • 20
  • 131
  • 152
  • I have no "inner working knowledge of a RDBMS", but I don't get why you think this is a problem. There are unbound data types like CLOBs. If that is what you want, then use it. If you need to do easier text comparisons, then you have to suffer the pain of typing (255) or whatever. Does not seem worth complaining about to me. But that's just my two cents. – MJB May 19 '10 at 13:20
  • 2
    It's worth noting that SQLite doesn't impose this requirement. – cikkle May 19 '10 at 13:21
  • It is logically impossible to infer a maximum length from the data actually given. How long should the database wait until it decides, "OK, I guess there will never be more than 255 characters here"? – Kilian Foth May 19 '10 at 13:28
  • possible duplicate of [Are there any disadvantages to always using nvarchar(MAX)?](http://stackoverflow.com/questions/148398/are-there-any-disadvantages-to-always-using-nvarcharmax) – Joe May 19 '10 at 13:41
  • I agree with the sentiment, why not start with an initial guess of text length and then have an option (per server, db, table or column) to allow the DBMS to automatically resize / reindex the column if a value greater than the width is inserted. – Even Mien May 19 '10 at 13:53

9 Answers9

7

Because computers (and databases) are stupid. Computers don't guess very well and, unless you tell them, they can't tell that a column is going to be used for a phone number or a copy of War and Peace. Obviously, the DB could be designed to so that every column could contain an infinite amount of data -- or at least as much as disk space allows -- but that would be a very inefficient design. In order to get efficiency, then, we make a trade-off and make the designer tell the database how much we expect to put in the column. Presumably, there could be a default so that if you don't specify one, it simply uses it. Unfortunately, any default would probably be inappropriate for the vast majority of people from an efficiency perspective.

tvanfosson
  • 524,688
  • 99
  • 697
  • 795
  • A column containing a phone number will generally settle say around 10 characters. When this is the case, it makes perfectly sense for the database to treat that as say varchar(13). For columns that vary greatly where there's no consensus the worst case would be that the column defaults to varchar(max) and for these scenarios it would be useful to have a self-tuning text data type. – John Leidegren May 20 '10 at 12:47
  • @John - so what you're really asking is not for current databases to simply infer a default, but rather that database storage engines fundamentally change the way they allocate storage. I honestly haven't done much research into that topic, but I would imagine that eventually schemes like that end up pulling character data out into its own "container", much like and with the same problems of varchar(max). It's an interesting thought experiment, but not particularly relevant to my day to day work. – tvanfosson May 20 '10 at 13:15
  • I guess I am. Though, I believe the varchar(max) data type is far better for these scenarios than I original thought. It seems that it actually does some in/out of row storage decisions depending on size of the value. – John Leidegren May 22 '10 at 11:13
2

This post not only answers your question about whether to use nvarchar(max) everywhere, but it also gives some insight into why databases historically didn't allow this.

Community
  • 1
  • 1
Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
2

It has to do with speed. If the max size of a string is specified you can optimize the way information is stored for faster i/o on it. When speed is key the last thing you want is a sudden shuffling of all your data just because you changed a state abbreviation to the full name.

With the max size set the database can allocate the max space to every entity in that column and regardless of the changes to the value no address space needs to change.

Covar
  • 901
  • 1
  • 7
  • 15
  • 1
    but it does not do that - bad news. Any decent database does NOT use 3000 bytes to store a varchar(3000) field with only 4 chars ;) Long time ago -yes. Since 20 years - no. – TomTom May 19 '10 at 13:36
  • 1
    @TomTom: However, it's useful for the database to know that the varchar(3000) field will not take more than 3K characters. It's really hard to set up a good mapping of rows to disk sectors without knowing how big a row can get. – David Thornley May 19 '10 at 14:06
  • How is me saying mycolumn varchar(max) different from the database querying the table for MAX(LEN(mycolumn))? At some point in time it will always be able to tell that mycolumn of a row has a particular size, that size however just won't be constant. – John Leidegren Jun 09 '10 at 05:54
1

This is like saying, why can't we just tell the database we want a table and let it infer what type and how many columns we need from the data we give it.

Simply, we know better than the database will. Supposed you have a one in a million chance of putting a 2,000 character string into the database, most of the time, it's 100 characters. The database would probably blow up or refuse the 2k character string. It simply cannot know that you're going to need 2k length if for the first three years you've only entered 100 length strings.

Also, the length of the characters are used to optimize row placement so that rows can be read/skipped faster.

Malfist
  • 31,179
  • 61
  • 182
  • 269
0

I think it is because the RDBMS use random data access. To do random data access, they must know which address in the hard disk they must jump into to fastly read the data. If every row of a single column have different data length, they can not infer what is the start point of the address they have to jump directly to get it. The only way is they have to load all data and check it.

If RDBMS change the data length of a column to a fixed number (for example, max length of all rows) everytime you add, update and delete. It is an extremely time consuming

vodkhang
  • 18,639
  • 11
  • 76
  • 110
  • except they manage to really optimize that with a varchar - a carchar(3000) does not allocate all 3000 bytes all the time anyway ;) – TomTom May 19 '10 at 13:36
  • @Tomtom - sounds like there is no arguement for not setting every one to (8000) since memory isn't an issue. – JeffO May 19 '10 at 15:26
0

What would the DB base its guess on? If the business requirements change regularly, it's going to be just as surprised as you. If there's a reason you don't use nvarchar(max), there's probably a reason it doesn't default to that as well...

joelt
  • 2,672
  • 2
  • 24
  • 32
0

check this tread http://www.sqlservercentral.com/Forums/Topic295948-146-1.aspx

Sakhawat
  • 405
  • 2
  • 6
  • 20
0

For the sake of an example, I'm going to step into some quicksand and suggest you compare it with applications allocating memory (RAM). Why don't programmers ask for/allocate all the memory they need when the program starts up? Because often they don't know how much they'll need. This can lead to apps grabbing more and more memory as they run, and perhaps also releasing memory. And you have multiple apps running at the same time, and new apps starting, and old apps closing. And apps always want contiguous blocks of memory, they work poorly (if at all) if their memory is scattered all over the address space. Over time, this leads to fragmented memory, and all those garbage collection issues that people have been tearing their hair out over for decades.

Jump back to databases. Do you want that to happen to your hard drives? (Remember, hard drive performance is very, very slow when compared with memory operations...)

Philip Kelley
  • 39,426
  • 11
  • 57
  • 92
  • I don't see how this is relevant. I would never let the user govern arbitrary memory allocations. That just irresponsible and potentially a security risk. The database can make up it's mind based on statistics, if no consensus can be reached then it defaults to say varchar(max) however that unlikely to happen every time. – John Leidegren May 20 '10 at 12:41
0

Sounds like your business rule is: Enter as much information as you want in any text box so you don't get mad at the DBA.

You don't allow users to enter 5000 character addresses since they won't fit on the envelope.

That's why Twitter has a text limit and saves everyone the trouble of reading through a bunch of mindless drivel that just goes on and on and never gets to the point, but only manages to infuriate the reader making them wonder why you have such disreguard for their time by choosing a self-centered and inhumane lifestyle focused on promoting the act of copying and pasting as much data as the memory buffer gods will allow...

JeffO
  • 7,957
  • 3
  • 44
  • 53