2

I have a database field that needs to store text and HTML tags from a WYSIWYG input box in a web interface.

The text can be a maximum of 2000 characters, but the HTML tags can vary depending whether a visitor uses the WYSIWYG feature and how much he uses it.

I am using MS SQL server, but I feel the question is database independent.

For this field, should I use varchar like the following varchara 4000 (2000 chars plus 2000 chars for possible HTML tags. But HTML tags could be more than 2000 chars depending on users)? Or should I use text or lob data types (which can store unlimied data)?

What is the practical and balanced design (performance, storage, etc.) considering that there will be hundreds of millions of records and high database save and query transactions?

Many thanks!

curious1
  • 14,155
  • 37
  • 130
  • 231
  • 1
    What version of SQL Server? On modern versions I'd use NVARCHAR(MAX)... (Note I'm choosing NVARCHAR because I'm guessing if you have hundreds of millions of records, some of your users may not be writing in English...) – Matt Gibson Oct 23 '15 at 13:51
  • I am using SQL Server 2008 for the development now. And the production database will be 2012 or 2014. Is NVARCHAR(MAX) available on all htese versions? Thanks for your input! – curious1 Oct 23 '15 at 13:54
  • 2
    Yup, it's been available since 2005, now I check. But that still seems recent to me! Perhaps I'm just being paranoid today because I recently answered a question where someone was still using SQL Server 2000 as their production server :) – Matt Gibson Oct 23 '15 at 13:55
  • Matt, many many thanks for helping me out. Have a great weekend! – curious1 Oct 23 '15 at 13:59

1 Answers1

5

You should use varchar(MAX)/nvarchar(MAX), which is the successor of the text/ntext data type, since it allows for arbitrary length and does not produce additional overhead. You don't save space by restricting the size of your (n)varchars.

Personally, I would recommend nvarchars over varchars. They take up a bit more space, but you save yourself a lot of trouble once your users start to enter non-English characters.

Note that you cannot create an index on a (n)varchar(MAX) field.

Heinzi
  • 167,459
  • 57
  • 363
  • 519
  • Does nvarchar(MAX) provide the best performance in the case of hundreds of millions of records and intensive save/query operations on this field? Regards. – curious1 Oct 23 '15 at 13:55
  • 1
    @curious1: Nope, you get the **best** performance by writing your own database engine optimized to your particular purpose. If you really wanted to ask: "Is nvarchar(MAX) a good cost/performance tradeoff?", then the answer is "yes". – Heinzi Oct 23 '15 at 13:57
  • Heinzi, thanks for your post and follow up. I already use nvarchar, which I forgot to mention in the post. Regarding performance, you mean I get the best performance based on my database design and programming, correct? I am confused by "writing your own database engine". – curious1 Oct 23 '15 at 14:00
  • 1
    @curious1: I wanted to point out that "best" is not something worth aiming at. "Best" is absolute. You can *always* improve performance by making your solution more complex and more costly. – Heinzi Oct 23 '15 at 14:09
  • 3
    @curious1: Let me put it this way: I cannot guarantee that this solution will be fast enough for your purpose, but I cannot think of a faster one at the moment (with reasonable effort). If performance is important to you, I'm afraid you will have to *test* whether the solution if fast enough rather than relying on the word of random strangers on the Internet. – Heinzi Oct 23 '15 at 14:12