1

We are importing daily data (~600K rows) from Google GA4 into our data warehouse.

We are trying to decide how we are going to store the data. We're considering either a partitioned table or a partitioned view that queries daily tables, with the main goal being to improve query speed.

So far, we've implemented a partitioned view, with a primary key on the id and event_date columns. We were happy to see that the estimated execution plan shows that a query that filters by the date range only runs a clustered index scan on the daily tables that are in the date range specified.

enter image description here

However, we were surprised to see how long it took for a query to return the total number of rows for three days (23 seconds) vs running a query against each day individually (less than 1 second).

We were planning on running a test by creating a partitioned table, but wanted to check to see if there were any recommendations or best practices we should know about.

enter image description here

Matthew Walk
  • 1,014
  • 2
  • 16
  • 36
  • The execution plan that you have looks like an estimated plan (neither times nor "estimated vs actual" row counts on the operators). In the case where you actually ran the query, can you collect an actual execution plan to tell which operators are accumulating the most time? – Ben Thul May 02 '23 at 01:18
  • *"the main goal being to improve query speed."* neither of these options is going to do that. Partitioned tables are primarily for management purposes (bulk deletion) and partitioned views are to combine disparate tables into a single view. Why do you think either of these is going to help, vs just indexing your single table on the correct column(s)? – Charlieface May 02 '23 at 01:38
  • RE "However, we were surprised to see how long it took for a query to return the total number of rows for three days (23 seconds)": did you run these queries only once or multiple times? It could simply be the case that data got cached in memory. – Alex May 02 '23 at 01:43
  • 1
    just create a clustered index on date. partitioning will most likely decrease speed a bit if anything else. – siggemannen May 02 '23 at 11:39
  • Hi @BenThul I've updated the image to include the actual execution plan. – Matthew Walk May 02 '23 at 20:17
  • Hi @Charlieface, we were concerned about adding all of the rows to one table because the table would become very large. In the case of event_param values, we are seeing approximately 6 million rows per day, about 180 million per month. – Matthew Walk May 02 '23 at 20:21
  • Hi @Alex, the query was run multiple times, on different days, with similar results. When running the query against the view today, it took over a minute to execute. Running the queries against the individual days took less than a second, again. – Matthew Walk May 02 '23 at 20:22
  • Hi @siggemannen, are you suggesting inserting all of the data into one table? The main concern is the number of records, around 180 million per month. – Matthew Walk May 02 '23 at 20:23
  • 1
    Hmm, then it might make sense to partition by date and create correlated PK by date too, not for speed only but for maintenance + speed combo. Backuping up the whole enchilada or moving dates in an out will be much faster if you partition. Are you gonna clear up old data? Or what kind of operations are you planning on supporting? – siggemannen May 02 '23 at 20:42
  • 1
    @MatthewWalk - in re: running multiple times, is the view slow twice in a row? Like if you execute two of your count queries that go against the view in the same batch, do they both run in over a minute? Or does the first run in a minute and the second one run fast? Similarly, are you always selecting from the view and then from the individual tables in your testing? Or do you sometimes change the order? If the latter, is the view always slower? – Ben Thul May 02 '23 at 21:36
  • 1
    180 million is a fair bit, but if it's well indexed it won't matter. What will matter is when you try and manage it: updates/deletes and index rebuilds are going to be problematic. Yes, partitioned tables might be the answer, but we need to see some realistic queries first, showing how you intend to query the table. Please show that (as text not images) along with the full table and index definitions. – Charlieface May 02 '23 at 22:38

0 Answers0