We have a table that has a lot of columns with numerous datatypes. The nullable columns consist of varchar(8000)'s, ints, floats, datetimes, and more (which is another issue but for now it is what it is). I did an analysis and determined that the table currently has mostly nulls in the int and datetime columns (about 80% null for each, which according to https://learn.microsoft.com/en-us/sql/relational-databases/tables/use-sparse-columns?view=sql-server-ver15 should be ok, since they suggest 64%). In my tests, I created a version of the same table, but made the ints and datetimes sparse nulls. Then, I inserted what was in the original table into the sparse table, and my results on space seem worse. The reserved, index_size and unused are bigger, while the data is smaller. Is there anything else to consider besides percentage of nulls that could cause the sparse tables to be worse? Or am I misunderstanding how sp_spaceused works?
Asked
Active
Viewed 52 times
0
-
read up on https://stackoverflow.com/questions/3731172/how-much-size-null-value-takes-in-sql-server and as long as you don'z have datatypes that has a fixed size like varchar NULL take no psace, so changing the datatype make it worse – nbk Apr 25 '22 at 16:00
-
I would stay away from this feature tbh. The amount of space you save is hardly worth the effort, and keep in mind that having sparse columns prevents many future operations from being online. I think compression is a much more effective technique to recover some space and reduce buffer memory requirements and other trickle-down effects. – Aaron Bertrand Apr 25 '22 at 16:38
-
[For example, this limitation has affected this site’s uptime](https://meta.stackexchange.com/questions/376015/planned-maintenance-scheduled-for-saturday-february-19-2022-at-200am-utc-fri). – Aaron Bertrand Apr 25 '22 at 16:44
-
Aaron, the reason I am looking to implement some sparse columns is because we have some code that the developers use that selects every single column, and this table in particular has like 50 columns, which cause high memory/cpu waits. Updating this code has been brought up but it won't be for a while. For now, I figured this could be a good work around to minimize the memory/cpu waits. These are not columns that I predict would ever run into the issue of the limitations you mentioned. I also researched the limitations before planning on implementing this change too so I am aware of the risks. – DBANoob Apr 25 '22 at 17:15
-
But why do you think making these columns sparse will have the exact effect you’re after (and better than compression)? Also if the app is saying SELECT * but they don’t actually _need_ these columns, you could hide them via a view. – Aaron Bertrand Apr 25 '22 at 17:47
-
I am not as familiar with compression. But my thought was that a lot of these columns have null values. Unless I misunderstood, SQL Server is always putting aside memory when these SELECTS occur, even for the null values. I will look into compression though as an alternative. – DBANoob Apr 25 '22 at 18:15
-
Also note that for _every non-null value_ (not every row that has at least one non-null value), there's a storage penalty of 2-4 bytes. So it's quite possible that the savings you get from the null values are more than reversed by the penalty of the non-nulls (assuming you've rebuilt both tables and all indexes?). Even if the results were the opposite (and sparse bought you space), I just don't see how this specific optimization is going to solve a much broader "high memory/cpu waits" problem, which I'm still not convinced is due to the size of these columns in the first place. – Aaron Bertrand Apr 25 '22 at 19:30
-
Reference from [official documentation](https://learn.microsoft.com/en-us/sql/relational-databases/tables/use-sparse-columns): `Sparse columns require more storage space for non-NULL values than the space required for identical data that is not marked SPARSE. ... The length is equal to the average of the data that is contained in the type, plus 2 or 4 bytes.` So, we shouldn't be too shocked if we apply sparse columns and don't come out ahead. Please look into compression, I can almost _guarantee_ it will have better results on space, but it'll be up to you to gauge CPU overhead. – Aaron Bertrand Apr 25 '22 at 19:33
-
Please provide enough code so others can better understand or reproduce the problem. – Community Apr 25 '22 at 20:36
-
I'm aware of the trade-off with the non-null rows, which is why I checked how many rows were null before attempting this on specific columns. But even going by Microsoft's official recommendation on null percentages, it still didn't seem to help. I am looking into data compression, which in my test environment does seem more beneficial so far. I'll be attempting to implement it on a slightly larger environment in the coming days and will be able to see if it helps. – DBANoob Apr 25 '22 at 21:42