3

I am dealing with database tables with tens of millions of rows (with the potential to go to the hundreds of millions over time), and am looking at implementing database partitioning to try to keep performance stable as the row count increases. This is what I'd like to do:

Say I have a table that stores animals. One of the fields is the AnimalType (i.e. Bird/Fish/Cat/Dog). I'd like each AnimalType to be a separate partition, because 99% of queries only ever relate to one AnimalType & there is roughly an equal amount of AnimalTypes in the table (i.e. 1000 fish, 1000 birds, 1000 dogs) so it means the partitions should be nice and evenly spread. However, there are a lot of animal types, and I don't want to go and manually create the hundreds of partitions for each AnimalType, and then every time a new AnimalType is entered have to create a new partition.

Therefore, what I would like, is some way to tell SQL Server to partition based on AnimalType. If there is a partition already for the AnimalType, use that partition, otherwise SQL Server will automatically create a new partition.

It sounds simple enough, but I cannot seem to find a way to do this. Is it possible?

Alternatively, what are some other methods to keep table access speeds nice and fast? I'd like to avoid anything that is just manually moving stuff into more tables, such as moving older records into a History style table, as there is the potential for queries to need data from the full data set and hence this won't actually help. I already have some basic Indexes which help significantly.

Tom H
  • 46,766
  • 14
  • 87
  • 128
mike
  • 3,146
  • 5
  • 32
  • 46

2 Answers2

3

This is a very old question so some updated information is probably in order. First of all, to answer the original question, yes, dynamic partitioning is possible by way of scheduled jobs:

Marlon Ribunal article, see section on dynamic paritioning

How to automate Table Partitioning in SQL Server by Jignesh Raiyani at SQL Shack

I also want to add that partitions can improve query performance in certain scenarios. For me, with clustered columnstore indexes, I was able to leverage partitioning to boost segment elimination. See the SO post about it from 2018:

Partitioning columnstore tables for performance

One must not enter into a partitioning scheme lightly. It is extra complexity and overhead that should be proven to enhance your design before implementing.

Cyndi Baker
  • 670
  • 8
  • 15
2

Partitioning is a solution for storage problems, ie. determine on what filegroup data is located based on some field value. On its own, it gives no real performance benefit, in fact it actually slows down queries most times because new partition location operators need to be added. The only way to enforce queries to consider only one partition is the $PARTITION syntax, and this cannot be used in real world applications scenarios. Queries that opt to look up only one partition do so solely based on the index ranges, and would scan exactly the same number of records with or without partitioning.

the only time when partitioning has a performance benefit is for administration activities like partition switch in and switch out from a table or bulk import operations.

Performance benefits can come only from proper indexes and carefully designed queries.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • 1
    *the only time when partitioning has a performance benefit is for administration activities* - doesn't partitioning actually reduce lock contention? Should't it allow better throughput by allowing parallel operations? If an index is also aligned with table partitions, it should only increase performance during heavy usage. Additionally, if partitions are split along multiple disks, it will allow even greater throughput. Maintenance operations (index rebuild and reorganize) can be executed against individual partitions, which will also reduce locking and execute jobs faster. – vgru Jun 20 '12 at 18:00
  • @Groo: locking reduction - no. Parallel operations - no (actually it hurts parallelism). split io - no, single partition can achieve the same and better with multiple file per filegroup. Maintenance against partitions: only offline. online operations are per table only. In addition partitioning introduces plan complexity, significant increase in memory footprint and last but not least, read ahead stall. – Remus Rusanu Jun 20 '12 at 20:08
  • Can you back that up? [According to MSDN](http://download.microsoft.com/download/D/B/D/DBDE7972-1EB9-470A-BA18-58849DB3EB3B/PartTableAndIndexStrat.docx), *SQL Server's table partitioning is designed to make [maintenance operations] on large tables easier, as well as improve performance of properly filtered queries [...] There are **enhanced parallel query operations** on partitioned tables that may provide better CPU utilization and query performance*. – vgru Jun 20 '12 at 20:23
  • The same link also states that *"you can reorganize partitioned indexes and specify individual partitions, lists of partitions, and even ranges of partitions"* and immediately after that explicitly states that *"reorganizing an index is always an **online** operation"*. Regarding parallelism, again an MSDN link: [Partitioned Table Parallelism](http://technet.microsoft.com/en-us/magazine/2008.04.dwperformance.aspx#s10). – vgru Jun 20 '12 at 20:24
  • Don't get me wrong, I am not a DBA, I cannot claim I know what I am talking about. I just recently started checking our options after we started seeing long lockouts during maintenance plans (online index reorganizing) scheduled during weekends. We have a database with a huge table containing 10fps measurements for ~20 devices for a period of a year, and we noticed that our inserts sometimes timeout, even with ~10 min. command timeouts (for operations which usually take around 80ms to complete). Our reasoning was simply to decrease locks' duration during scheduled plans. – vgru Jun 20 '12 at 20:30
  • Locking against online operations must be SCH_M vs. SCH_S locking at the end of the online operation, see [Final Phase](http://technet.microsoft.com/library/Cc966402). 10m waits implies the online operation was waiting for a query to drain and the query was lasting +10 minutes, blocking online operation final phase which in turn blocked (due to the lock wait list) *all* operations. The problem is the long running query. – Remus Rusanu Jun 20 '12 at 20:43
  • As for the query performance enhancements often quoted with regards to partitioning: none of those those are compared to a *range scan on a non partitioned clustered with the same key as the partitioning column*. Ie. things are *better than bad*, but not *as good as they can be*. – Remus Rusanu Jun 20 '12 at 21:00
  • http://www.brentozar.com/archive/2012/03/how-decide-if-should-use-table-partitioning/ is a good read – Remus Rusanu Jun 20 '12 at 21:01
  • thanks a lot for the link. It does mention "long-running index maintenance jobs" as one of the issues for considering partitioning (if I got them right), but I cannot seem to find the relation between job duration and partitioning explained anywhere. I hoped I could get some benefits without having to implement sliding window schemes, but I am not sure if that will happen by itself. Regarding locking, I have no control of the locks, I just do a batch insert (which usually takes ~80ms). On Mondays I usually receive error mails complaining that operations timed out. – vgru Jun 21 '12 at 06:44
  • @Groo: because Monday is the day Reports are running :) – Remus Rusanu Jun 21 '12 at 06:51