2

We have a database that is currently 1.5TB in size and grows by a gigabyte worth of data every day (a text file) that is 5 million records - and it grows daily

It has many columns, but a notable one is START_TIME which has the date and time -

We run many queries against a date range -

We keep 90 days worth of records inside of our database, and we have a larger table which has ALL of the records -

Queries run against the 90 days worth of records are pretty fast, etc. but queries run against ALL of the data are slow -

I am looking for some very high level answers, best practices

We are THINKING about upgrading to SQL Server enterprise and using table partitioning, and splitting the partition based on month (12) or days (31)

Whats the best way to do this?

Virtual Physical, a SAN, how many disks, how many partitions, etc. -

Sas

usr
  • 168,620
  • 35
  • 240
  • 369
Sarfaraz Jamal
  • 343
  • 4
  • 15

1 Answers1

2

You don't want to split by day, because you will touch all partitions every month. Partitioning allows you not to touch certain data.

Why do you want to partition? Can you clearly articulate why? If not (which I assume) you shouldn't do it. Partitioning does not improve performance per-se. It improves performance in some scenarios and it takes performance in others.

You need to understand what you gain and what you loose. Here is what you gain:

  • Fast deletion of whole partitions
  • Read-Only partitions can run on a different backup-schedule

Here is what you loose:

  • Productivity
  • Standard Edition
  • Lower performance for non-aligned queries (in general)

Here is what stays the same:

  • Performance for partition-aligned queries and indexes

If you want to partition, you will probably want to do it on date or month, but in a continuous way. So don't make your key month(date). Make it (year(date) + '-' + month(date)). Never touch old partitions again.

If your old partitions are truly read-only, put each of them in a read-only file-group and exclude it from backup. That will give you really fast backup and smaller backups.

Because you only keep 90 days of data you probably want to have one partition per day. Every day at midnight you kill the last partition and alter the partition function to make room for a new day.

There is not enough information here to answer anything about hardware.

usr
  • 168,620
  • 35
  • 240
  • 369
  • We keep only 90 days right now, because it makes our queries faster. – Sarfaraz Jamal Mar 03 '12 at 20:53
  • Ok, so you want to keep all data. Partition will not help with querying here. It will only allow you to kill old data fast. Inserts and select are just as fast as before, if not slower. – usr Mar 03 '12 at 20:54
  • The goal is to make the queries faster in older data if possible, right now, we limit all of our reports to be based on the last 90 days worth of data - but if we could have decent queries across ALL of the records, then we could keep the data together in one partitioned table. We do not have to partition it but we thought it would make the queries faster – Sarfaraz Jamal Mar 03 '12 at 20:55
  • No, it won't. I suggest you just try to create the right indexes. You could create an index on datetime and include _all_ the necessary other columns with it. This is called a covering index and will help a lot. I encourage you to do some research around index design. There might be an easy solution for you available. – usr Mar 03 '12 at 20:57
  • We already have a covering index that includes phone number, start time, event label - which are usually the three things we are looking for when we pull the query - and it runs reasonably fast in the 90 days worth of data. In the history table we have a covering key on those three items as well - – Sarfaraz Jamal Mar 03 '12 at 21:04
  • Ok, sounds good. If you have more concrete questions about partitioning I am glad to help. For general performance tuning I recommend asking a new question. – usr Mar 03 '12 at 22:57