0

I'm probably going to implement partitioning on a huge table (billions of rows).

Each table row has some kind of status about a particular device, which is inserted from minute to minute. Therefore, there will be 1440 (24 x 60) rows per day per device. Each device has a unique ID (DeviceID).

I thought about partitioning using DeviceID MOD {TheNumberOfPartitionsThatIWant}, I think TheNumberOfPartitionsThatIWant being 250 is a good compromise. Using this strategy, I can equally distribute the devices throughout the partitions, and also, when querying for a particular device, the query engine just needs to touch one partition, and not all the 250 partitions.

The problem is that I need to add an extra column to my table, just to indicate the partition that the row belongs to, so that I can define the table on a Partition Schema using this column. It would be much better to supply (DeviceID MOD 250) to the Partition Schema instead of having this column with that so simple expression. Is there a workaround for that?

gsb
  • 1,219
  • 1
  • 16
  • 31
  • I saw on your previous [question](http://stackoverflow.com/questions/4249073/partitioning-for-query-performance-in-sql-server-2008) a comment that you have **2** drives. Use 2 partitions.. no? If you want to handle these kind of volumes you don't scrimp *at all* on hardware. – gbn Nov 25 '10 at 12:01
  • That was just an experience. In the final solution, we will probably use a DAS or SAN configuration. – gsb Nov 25 '10 at 12:07
  • What is the purpose of your partitioning strategy? Is it to achieve partition elimination in queries, to support fast data loads or to allow selective archiving of old data? You could partition on a computed column based on DeviceID to create 250 ranges but it's not obvious that that would be a sensible partitioning strategy, given that all partitions would presumably get updated every day. Also, what is the expected size of the table in MB/GB per day? – nvogel Nov 25 '10 at 12:35
  • The purpose is to get partition elimination in queries. But during my tests, the query on the partitioned table is slower than the one on the not partitioned table. The indexes are set, and it's accessing just one partition (I checked it on the actual execution plan), but it's slower. Why? – gsb Nov 25 '10 at 13:48

3 Answers3

2

You can partition on a schema-bound computed column based on a function - however whilst that does work, the benefits are going to be limited and I would want to see a scaled test on it. It also then requires every access to that table to use the same function within the where criteria.

The key point is made by dportas - the partitioning is designed to make data aging a trivial operation, at some point the data in the system will need to be purge due to being too old / of no value, at which point you are reduced to long running deletions to remove this data.

In terms of spreading the data across multiple disks, partitioning does not provide anything significantly extra to the existing filegroup / file facilities.

Andrew
  • 26,629
  • 5
  • 63
  • 86
0

Suitable indexing will probably get you better results than partitioning. As @Andrew says, partitioning is primarily for fast data loading and unloading (that is, switching partitions in and out).

Philip Kelley
  • 39,426
  • 11
  • 57
  • 92
0

I created a computed column, solves the problem.