4

I have a table with 200 GB data in SQL Server 2016. So I am planning to apply Clustered ColumnStore in that table for disk compression and as well as better performance.

But the problem is under that table there is one column which datatype is NVARCHAR(MAX), and columnstore indexes don't support that datatype.

So what I am thinking is to change the datatype from NVARCHAR(max) to any other datatype which at least accept 81446 characters in the same column.

I have tried some other datatypes available in SQL like VARCHAR(8000) but what its doing is removed other data which is after 8000 characters.

I also tried text. But again in Text there is columnstore not applicable becuase of limitation.

So could you please give me any idea what datatype I have to use. Or is there any other way to apply ColumnStore index in the same table?

Brent Ozar
  • 13,174
  • 14
  • 68
  • 91

2 Answers2

7

You have several different questions in here:

Q: Can SQL Server 2016 use (MAX) datatypes in columnstore indexes?

No. The documentation states:

Don't use a clustered columnstore index when the table requires varchar(max), nvarchar(max), or varbinary(max) data types.

I would normally just stop there - if the documentation tells you not to do something, you probably shouldn't.

Q: Can I store more than 8,000 characters in VARCHAR(8000)?

No. The number means what it says - it's the max amount of characters you can store. If you try to store additional data, it will not survive.

Q: Can I build a clustered columnstore without those (MAX) fields?

Yes, by changing your data model and breaking the table up. Say the table involved is called FactTable:

  1. Create a new table with the large text fields - we'll call it FactTable_Text.
  2. Create a new table with the rest of the fields - we'll call it FactTable_Data. Put a clustered columnstore index on this, and you'll gain compression for it.
  3. Migrate the data from your old FactTable into these new tables
  4. Drop the old table
  5. Create a view called FactTable that joins the FactTable_Data and FactTable_Text together
  6. Users go on querying FactTable without knowing anything has changed

Unfortunately, you're probably going to have to change your ETL processes, and depending on how much text is involved in the table, you might not get any compression. For example, say 90% of the table's size is all due to the text - then you haven't really saved anything here.

Now you start to see why the documentation advises you that this isn't a good idea.

Brent Ozar
  • 13,174
  • 14
  • 68
  • 91
  • I liked your idea of splitting text field into another table! Cheers! – Merin Nakarmi Jan 11 '19 at 18:15
  • 7
    Even though OP mentioned SQL Server 2016, I'd like to point out for other visitors (this is a top hit on Google) that nvarchar(max) is supported in SQL Server 2017+ – Gabe Jun 28 '19 at 06:09
1

An alternative solution for reducing the size of tables using LOB columns is using SQL Server's native COMPRESS() and DECOMPRESS() functions when storing and reading LOB content. Here is a link to the Microsoft docs.

I started using COMPRESS() and DECOMPRESS() when I found out that the 30MB json files I was storing in a varchar(max) column were not being compressed at all by any table compression method. COMPRESS() got them shrunk to under 5% of their original sizes. Compressing varchar(max), nvarchar(max) and varbinary(max) formats works fine.

DECOMPRESS() will return a binary so make sure to convert or cast it to the original format (varchar or nvarchar) if needed.

I can't say a lot about the performance penalty. There is some but the performance was still great for what I needed to do.

COMPRESS() uses the Gzip compression standard. ETL tools that support Gzip can write already compressed content to SQL Server to be read using DECOMPRESS() later on. At least C# / .NET / SSIS can. I found a very usable C# code example on stackoverflow.

shortski
  • 68
  • 7