Partitioning for query performance in SQL Server 2008

Question

I have a scenario in which there's a huge amount of status data about an item. The item's status is updated from minute to minute, and there will be about 50,000 items in the near future. So that, in one month, there will be about 2,232,000,000 rows of data. I must keep at least 3 months in the main table, before archieving older data.

I must plan to achieve quick queries, based on a specific item (its ID) and a data range (usually, up to one month range) - e.g. select A, B, C from Table where ItemID = 3000 and Date between '2010-10-01' and '2010-10-31 23:59:59.999'

So my question is how to design a partitioning structure to achieve that?

Currently, I'm partitioning based on the "item's unique identifier" (an int) mod "the number of partitions", so that all partitions are equally distributed. But it has the drawback of keeping one additional column on the table to act as the partition column to the partition function, therefore, mapping the row to its partition. All that add a little bit of extra storage. Also, each partition is mapped to a different filegroup.

That's some load. Have a read [here](http://sqlblog.com/blogs/paul_nielsen/archive/2007/12/12/10-lessons-from-35k-tps.aspx) about high volume writes (you have 50k rows *per second* incoming). I'm intrigued how you'll solve this: I've no experience of that volume /rate of increase) at all — gbn, Nov 22 '10 at 18:58
Are you trying to design for write query efficiency or read query efficiency? What kind of read loads do you have? — Roopesh Shenoy, Nov 30 '10 at 12:56
Can you give us some more info on what columns are in the table and what column sizes (width) you return in the query? — RC_Cleland, Dec 03 '10 at 22:58

score 17 · Accepted Answer · edited Sep 23 '16 at 15:24

Partitioning is never done for query performance. With partitioning the performance will always be worse, the best you can hope for is no big regression, but never improvement.

For query performance, anything a partition can do, and index can do better, and that should be your answer: index appropriately.

Partitioning is useful for IO path control cases (distribute on archive/current volumes) or for fast switch-in switch-out scenarios in ETL loads. So I would understand if you had a sliding window and partition by date so you can quickly switch out the data that is no longer needed to be retained.

Another narrow case for partitioning is last page insert latch contention, like described in Resolving PAGELATCH Contention on Highly Concurrent INSERT Workloads.

Your partition scheme and use case does not seem to fit any of the scenarios in which it would benefit (maybe is the last scenario, but is not clear from description), so most likely it hurts performance.

I compared this partitioned table solution to another table that wasn't partitioned and the results were slightly worse on the partitioned solution (98ms vs 99ms) I've used 8 partitions, now, I'll try to use 250 instead, distributed in 2 drives, and see how things will play. — gsb, Nov 23 '10 at 10:49
Poco - two (2) drives, is there going to be only two drive in the production system? — RC_Cleland, Dec 03 '10 at 23:01

iDevlop · Answer 2 · 2010-11-29T17:23:21.717

I do not really agree with Remus Rusanu. I think the partitioning may improve performance if there's a logical reason (related to your use cases). My guess is that you could partition ONLY on the itemID. The alternative would be to use the date as well, but if you cannot predict that a date range will not cross the boundaries of a given partition (no queries are sure to be with a single month), then I would stick to itemId partitioning.

If there are only a few items you need to compute, another option is to have a covering index: define an INDEX on you main differentiation field (the itemId) which INCLUDEs the fields you need to compute.

CREATE INDEX idxTest ON itemId INCLUDE quantity;

Manu · Answer 3 · 2010-12-05T18:39:23.557

Applicative partitioning actually CAN be beneficial for query performance. In your case you have 50K items and 2G rows. You could for example create 500 tables, each named status_nnn where nnn is between 001 and 500 and "partition" your item statuses equally among these tables, where nnn is a function of the item id. This way, given an item id, you can limit your search a priori to 0.2% of the whole data (ca. 4M rows).

This approach has a lot of disadvantages, as you'll probably have to deal with dynamic sql and a other unpleasant issues, especially if you need to aggregate data from different tables. BUT, it will definitely improve performance for certain queries, s.a. the ones you mention.

Essentially applicative partitioning is similar to creating a very wide and flat index, optimized for very specific queries w/o duplicating the data.

Another benefit of applicative partitioning is that you could in theory (depending on your use case) distribute your data among different databases and even different servers. Again, this depends very much on your specific requirements, but I've seen and worked with huge data sets (billions of rows) where applicative partitioning worked very well.

Partitioning for query performance in SQL Server 2008

3 Answers3

Linked