Background
I'm migrating an application away from a schema with a single event table holding 0.5-1.3 billion records to partitioned views with 30-180 identical underlying tables that each hold a fraction of that number of records which considerably eases a number of problems around the management and querying of that many records. I'm using partitioned views instead of table partitioning because not all my users have access to the Awesome Edition of SQL Server 2012.
The partition function is based on the cartesian product of the day of the event and one attribute of the event. In other words, all events for 2015 Jun 01 with attribute "foo" went into a table like "Event20150601_foo", all events with attribute "bar" went into "Event20150601_bar" and for the following day, events will go into "Event20150602_foo" and "Event20150602_bar", etc. There are typically 2-3 values of attribute and 15-60 days of events which maps to the typical range of 30-180 distinct tables.
The basic structure of each table is a composite clustered primary key consisting of EventId (bigint) and PartitionKey (int) followed by a handful of other unindexed columns. The EventId is unique, monotonically increasing across all tables, and implemented with a sequence. The PartitionKey is unique per partitioned table so the check constraint on each table is simply "CHECK (PartitionKey = x)" where x is defined per partition table. This allows me to search on EventId via a clustered index seek even if I cannot provide the partition key to narrow the scope of the search. In the case where I can provide both EventId and PartitionKey, the query is very efficient since the optimizer can then perform an index seek on only the single table that matches the partition key which is a big performance win.
Experiment
It would make my life even easier if I could go even more granular and partition based on the day of the event and a broader combination of attributes (500-5000 instead of the aforementioned 2-3) which would then require me to have in the range of 10,000-50,000 different partitions. I tested this out on a database with 20,000 different tables and found, unfortunately, that the running time for different operations goes up at a greater than linear rate with the number of tables. Per the query statistics, most of the time was spent in parsing and compiling the query rather than executing the query.
Here are the running times I found for different operations (pardon the rudimentary table):
- Create the partitioned view
100 tables : 50ms
1000 tables : 800ms
2000 tables : 2,660ms
4000 tables : 10,000ms
16000 tables: 225,000ms
select * from PartitionedView where EventId = x
(scans all tables)100 tables : 78ms parse/compile time; 25ms exec time
1000 tables : 3,500ms parse/compile time; 160ms exec time
2000 tables : 15,000ms parse/compile time; 500ms exec time
4000 tables : 68,000ms parse/compile time; 2,000ms exec time
16000 tables:
cancelled after > 10 minutes parse/compile time!
select * from PartitionedView where (EventId = x) and (PartitionKey = y)
(scans only one table)100 tables : 74ms parse/compile time; 1ms exec time
1000 tables : 2,500ms parse/compile time; 15ms exec time
2000 tables : 11,000ms parse/compile time; 10ms exec time
4000 tables : 50,000ms parse/compile time; 16ms exec time
16000 tables:
cancelled after > 10 minutes parse/compile time!
Question
Should I be scolded for even considering using that many tables? If not, is there a way I can reduce the query parse & compile times when there are large numbers of tables present?