4

I am going to store stories in nvarchar(MAX) fields in SQL Server, but I know the stories will be much longer than MAX allows, so what approach should I take? Should I split the story across multiple rows or should I skip using a database and use text files?

Tim Lehner
  • 14,813
  • 4
  • 59
  • 76
Xaisoft
  • 45,655
  • 87
  • 279
  • 432
  • 6
    I sincerely doubt the stories are longer than what `nvarchar(max)` allows. That is 1 billion characters. War and peace only has 560,000 words. – Martin Smith Jan 13 '12 at 15:37
  • Cue Dr. Evil jokes here. – Chris Shain Jan 13 '12 at 15:38
  • @MartinSmith, I am little confused, I thought max only allows 8000 characters. – Xaisoft Jan 13 '12 at 15:38
  • Yeah. Sounds delusional - if you REALLY get into that, the main question is how to build the server storage. Likely the poster never read the documentation abuot SQL Server data types so he does not know the limit. I wuold be surprsied to see a story that long. – TomTom Jan 13 '12 at 15:38
  • The maximum size of an `nvarchar(n)` is 8000 bytes (4000 characters). `n` being a number. Using `MAX` has the limit linked above – Andrew Barber Jan 13 '12 at 15:40
  • @TomTom - I read it, but I keep seeing 8000 bytes/characters. over 1 billion is pleny of space. – Xaisoft Jan 13 '12 at 15:40
  • 1
    @Xaisoft - No. `8,000` is the maximum non `MAX` value. If you specify `nvarchar(max)` it allows up to 2GB. – Martin Smith Jan 13 '12 at 15:40
  • @AndrewBarber - I am slightly confused, you said the nvarchar(n) has a max of 8000 bytes, but if n is MAX, then it is over 1 Billion bytes – Xaisoft Jan 13 '12 at 15:41
  • @MartinSmith - I get it now, so I can't do something like nvarchar(9000), correct? I always thought MAX referred to 8000. – Xaisoft Jan 13 '12 at 15:42
  • 1
    NVARCHAR explained: http://msdn.microsoft.com/en-us/library/ms186939.aspx – Tim Lehner Jan 13 '12 at 15:42
  • @Xaisoft - `MAX` is special. If you specify a length, the max is `8000`. If you specify `MAX` then it uses off-page storage for a BLOB `binary large object` and can be over a billion characters. – JNK Jan 13 '12 at 15:43
  • @Xaisoft - Yep. correct. Anything over `8000` is handled by specifying `max` – Martin Smith Jan 13 '12 at 15:43
  • @Xaisoft Take a look at http://msdn.microsoft.com/en-us/library/ms186939.aspx – vcsjones Jan 13 '12 at 15:43
  • 4
    @TomTom - There's no need to be rude. OP is understandably confused on this. – JNK Jan 13 '12 at 15:43
  • @JNK - You took the words right out of my mouth. I like Garmin anyway, I know stupid joke. – Xaisoft Jan 13 '12 at 15:46
  • Thanks for everyone's help, I feel like an idiot, but there are others who probably were confused as well. I probably will break the stories into chapters anyway instead of storing them in one nvarchar(MAX) field. – Xaisoft Jan 13 '12 at 15:47
  • 1
    @JNK and I think it *is* confusing. They should have made varchar(max) a separate data type or just allowed values up to 1 billion to be passed as n. Or used a different word than `max`. – D'Arcy Rittich Jan 13 '12 at 15:48
  • I have posted an answer which addresses the core misunderstanding here, I think. – Andrew Barber Jan 13 '12 at 15:48
  • @RedFilter - agreed. It's totally different data type but they left it as `varchar()` – JNK Jan 13 '12 at 15:50
  • Good to know I am not the only one confused. – Xaisoft Jan 13 '12 at 15:52
  • @Xaisoft Ignore the haters, it's a perfectly reasonable mistake to make, and one that Microsoft leads you to make by the keyword `max`. – D'Arcy Rittich Jan 13 '12 at 15:55
  • @RedFilter - Trust me, I ignore them, probably why I have 515 questions on the site, lol. I always find it funny that the questions I don't think will generate a lot of responses normally do. Thanks for the help. – Xaisoft Jan 13 '12 at 15:57

4 Answers4

7

I believe the confusion stems from a misunderstanding of terms here.

nvarchar(n) is a data type where n can be a number from 1-4000. The number n in this case has a max of 4000, which adds up to 8000 bytes (2 bytes per character).

nvarchar(MAX) is a different data type altogether - the keyword MAX is a literal, and it is not a synonym for any potential value of n in my example above. Fields of this type have a maximum length of 2^31-1 characters, or over 1 billion, which adds up to over 2 billion bytes (2 bytes per character).

The same principles apply to varchar(n) and varchar(MAX), except each character may only be 1 byte, in which case the number of characters that can be stored is double. Whether it is only 1 byte depends on the collation, as Martin Smith notes in a comment!

Community
  • 1
  • 1
Andrew Barber
  • 39,603
  • 20
  • 94
  • 123
  • 1
    Thank you for understanding my confusion. – Xaisoft Jan 13 '12 at 15:49
  • @Xaisoft You are very welcome. It was confusing that they chose that naming scheme for that data type, as it is functionally more related to the deprecated `NTEXT` than to `nvarchar(n)`. And using the key word `MAX` further would seem to lead to people believing that `MAX` was a synonym meaning *just use the maximum numeric value here* – Andrew Barber Jan 13 '12 at 15:54
  • 1
    +1 But slight nitpick. `varchar` isn't always 1 character = 1 byte. [That depends on the collation](http://stackoverflow.com/questions/176514/what-is-the-difference-between-char-nchar-varchar-and-nvarchar-in-mssql/8250586#8250586) – Martin Smith Jan 13 '12 at 16:01
  • @MartinSmith Shows you how little (never) I've worked with collations different than the default! Edited this info into the answer. – Andrew Barber Jan 13 '12 at 16:04
  • It's worth noting that the OP's mistaken belief, was indeed true on earlier versions of SQLServer, where we'd have to choose between a `nvarchar(4000)` we hoped would work or a less convenient `ntext` we knew would. – Jon Hanna Jan 13 '12 at 16:32
  • @JonHanna I *think* the OP's mistaken belief here was that `MAX` was a synonym for either 4000 or 8000, as appropriate for `nvarchar()` or `varchar()`. – Andrew Barber Jan 13 '12 at 16:36
  • 1
    @AndrewBarber yes, but given that there was (and still is) legacy documentation saying that those where the maximum, it's hardly surprising. – Jon Hanna Jan 13 '12 at 16:41
  • @JonHanna Heck... even if you skimmed the *current* documentation (and who does more than skim if all you are looking for is "what's the max value for this?") I could see how it would seem to make sense that `MAX` was some sort of synonym. I think they'd have been better off sticking with a totally separate name for the functional replacement of `NTEXT`/`TEXT`. But, that might have broken an SQL standard... (I'm chuckling right now...) – Andrew Barber Jan 13 '12 at 16:47
  • Heh. I think I'd only noticed the change myself, because I'd been complaining for so long about how SQLServer's ntext didn't behave quite like an unlimited nvarchar the way PostgreSQL's does for so long that I actively went looking for such a feature in the docs when the 2005 came out. – Jon Hanna Jan 13 '12 at 16:55
4

Store them in chapters.

This is not technical - it is pretty much impossible to have astory of 1 billion nvarchar characters (and nvarchar(max) is the "new" TEXT data type.

BUt loading and processing them will be painfull.

Store them as chapters and store a start / end page number for every chapter when it makes sense, so you can navigate a little easier.

Btw., you posted you thought it is 800 chars - that was NEVER trhe case. The limit would be 8000 bytes - if it would apply - and that would be 4000 chars unicode.

TomTom
  • 61,059
  • 10
  • 88
  • 148
  • +1 because this could be excellent additional advice for this usage, if the stories are multi-chapter stories. – Andrew Barber Jan 13 '12 at 16:08
  • If they are not they wont be EXTREMELY large anyway ;) – TomTom Jan 13 '12 at 16:10
  • 1
    Tangentially-related information: According to [wikipedia](http://en.wikipedia.org/wiki/List_of_longest_novels), the longest, single-volume novel written in English is "Poor Fellow My Country", at 850,000 words. And according to [this document](http://www.puchu.net/doc/Average_Word_Length), the average length of a word in English is 5.10 – Andrew Barber Jan 13 '12 at 16:17
  • 1
    Making it tiny compared to the storage possibilities of nvarchar - but that is a novel. I am sure I can come up with longer texts ;) – TomTom Jan 13 '12 at 18:08
2

I'd probably suggest looking into document oriented databases for something like this.

Mr Moose
  • 5,946
  • 7
  • 34
  • 69
  • 1
    -1. Always nice to switch technologies jsut because you dont even know what your current one can. I dont doubt wwhether it is good or bad, but the poster simply asks because he has misinformation and thinks the technical limit is 8000 haracters (which it never was). – TomTom Jan 13 '12 at 15:40
  • That's fine and good (and I am a big NoSQL supporter), but if he is already on SQL Server, then varchar(max) is probably the way to go (as others have suggested). – Chris Shain Jan 13 '12 at 15:40
  • @TomTom This answer doesn't deserve a -1. The OP posed his problem, Moose proposed one possible solution. It's not a bad solution (especially considering it was posted before it became obvious that the OP was misinformed). Don't -1 it just because it's not the solution you would choose. – ean5533 Jan 13 '12 at 15:42
  • Well, it is a solution ignoring that the problem the poster mentioned NEVER EXISTED. Anyone qualified to anxswer would know that. -1 stands. Suggesting other technolgoies is just not answering the question. – TomTom Jan 13 '12 at 15:43
  • @TomTom. At the time of my post, there was no suggestion that the original poster thought the limit was 8000 characters. Having said that, it still a valid point to illustrate alternatives if the original poster isn't aware they exist. I wasn't suggesting he abandon what he has, I merely suggested he look into it. – Mr Moose Jan 13 '12 at 15:46
  • At the time of your post it should ahve been clear that the limit is so high hat no "story" ever reaches it. – TomTom Jan 13 '12 at 15:47
  • @MrMoose - I appreciate the alernative approach even though I may have been misleading. – Xaisoft Jan 13 '12 at 15:52
  • @Xaisoft. No probs. I'd also check out some of the related questions listed regarding this. There are other issues to consider when storing large amounts of data in a database. Check out http://stackoverflow.com/questions/211895/storing-documents-as-blobs-in-a-database-any-disadvantages and look at the link Bill the Lizard has in his comment. – Mr Moose Jan 13 '12 at 15:57
-3

Ok you could try storing as LONGTEXT (Mysql) or TEXT (MSSQL) (if you want to store objects I think you can use BLOB) datatype?

Andi Lee Davis
  • 85
  • 3
  • 11
  • This was tagged `sql-server` so `LONGTEXT` does not apply. `TEXT` has been deprecated since SQL 2005. `nvarchar(MAX)` is the correct data type to use. – Andrew Barber Jan 13 '12 at 15:41
  • Although I never use BLOB... If I store images I store them on the server and refer to them in the DB as this saves ooodles of space in the DB. – Andi Lee Davis Jan 13 '12 at 15:41
  • -1. Text is depreciated and nvarcahr(max) should now bwe used. In fact, Text is an alias to that. – TomTom Jan 13 '12 at 15:41
  • 1
    I heard Text was going to be deprecated. – Xaisoft Jan 13 '12 at 15:43