1

I'm attempting to perform temporal analytic queries against tables with well over 1M rows. A typical question will be how many rows satisfy "some criteria" over an arbitrary time window broken into arbitrary time periods, such as the last 4 months, grouped into every 3 days.

Our current solution is to perform one count query per time period, so the above example would produce 40 different queries, which makes the performance unacceptable.

One approach I've taken is to create a temporary table, like so:

create temporary table time_series (
    lower_limit timestamp default current_timestamp, 
    upper_limit timestamp default current_timestamp
);

insert into time_series (lower_limit, upper_limit) values
    ('2017-01-15 00:00:00', '2017-01-18 00:00:00'), 
    ('2017-01-18 00:00:00', '2017-01-21 00:00:00'), 
    ...
    ('2017-05-09 00:00:00', '2017-05-12 00:00:00'), 
    ('2017-05-12 00:00:00', '2017-05-15 00:00:00');

select ts.upper_limit, count(mbt.time) from time_series ts 
join my_big_table mbt on 
(
    mbt.time >  ts.lower_limit and 
    mbt.time <= ts.upper_limit
)
group by ts.upper_limit
order by ts.upper_limit;

drop table time_series;

Produces...

+---------------------+-----------------+
|     upper_limit     | count(mbt.time) |
+---------------------+-----------------+
| 2017-01-18 00:00:00 |           65890 | 
| 2017-01-21 00:00:00 |           98230 | 
| ...                 |                 | 
| 2017-05-12 00:00:00 |           57690 | 
| 2017-05-15 00:00:00 |            2349 | 
+---------------------+-----------------+

This is much more performant than our current solution, but the problem is I don't own the database. The tables could reside in either Oracle, SQLServer, MySQL, or PostgreSQL, and I will likely have only have SELECT privileges, so the ability to create and drop temporary tables is not guaranteed. For instance, I performed the above SQL in MySQL, but I had to grant myself the CREATE TEMPORARY TABLE to do so.

Is there a way to create a "synthetic table" (not sure what else to call it) that I could use within the scope of the query that would accept the fixed list of timestamps as periodic boundaries, similar to what I have above, except without the temporary table?

chaserb
  • 1,340
  • 1
  • 14
  • 25
  • 3
    Please tag only the dbms you are currently use. – Ilyes May 10 '17 at 20:50
  • I think you can create table variables in all of these scenarios. – S3S May 10 '17 at 20:52
  • For all databases, except for mysql, you could do this with a CTE. You may end up with a couple of different performant queries that are executed depending on the RDBMS that you are querying. – JNevill May 10 '17 at 20:55
  • You might be able to do this through subquery factoring (`WITH` clause), but the row-generating SQL would vary based on the target RDBMS. I assume that's acceptable? – Mick Mnemonic May 10 '17 at 20:57
  • Which MySQL version are you using? – Mick Mnemonic May 10 '17 at 22:14
  • @Mick Mnemonic, yes row-generating variants are acceptable. We're developing against MySQL 5.7, but we will likely encounter earlier versions in the wild that we'll need to support. We haven't drawn a line in the sand yet on the minimum version. – chaserb May 11 '17 at 00:36

2 Answers2

0

Here is an (ugly!) query that generates a sequence of 125 integers [0-124] by doing cross joins.

SELECT A.N + 5*(B.N + 5*(C.N)) AS seq
  FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A
  JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B
  JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C

You can use a LIMIT clause to get the integers [0-39] out of it like this:

SELECT A.N + 5*(B.N + 5*(C.N)) AS seq
  FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A
  JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B
  JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C
 LIMIT 40

Then, you can use this nasty slug of SQL code to generate another query with a bunch of date ranges in it.

SELECT daterange.start_date + INTERVAL sequence.seq DAY lower_limit,
       daterange.start_date + INTERVAL (sequence.seq + 3) DAY upper_limit
  FROM (
          SELECT DATE('2017-01-15') start_date
       ) daterange
  JOIN (
         SELECT A.N + 5*(B.N + 5*(C.N)) AS seq
          FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A
          JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B
          JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C
         LIMIT 40
       ) sequence

That has the same contents as the temp table you're trying to create. So you can then use this as a subquery -- a virtual table -- to get the result you're going for.

The advantage of this use of cross join is that you only need the privilege to SELECT stuff in the dbms.

An added advantage: If you show it to the DBA who isn't letting you create temp tables, she make take pity on you and let you do this the easy way.

If you happen to be working in MariaDB 10 or better, there are built in pseudo tables called sequence tables. For example

    SELECT seq FROM seq_0_TO_39

gives you that same sequence of integers [0-39]. That makes this sort of thing less verbose in SQL.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Thanks O. Jones. This certainly solves the SELECT privilege limitation. I'd have to think about how to make it generic to arbitrary time windows and periods, like two years by every three months, or 24 hours by every 30 minutes. – chaserb May 11 '17 at 01:38
0

Thanks for all the suggestions in the comments. While I was researching your suggestions (like whether I can use table variables across all RDBMS's), I came across this comment that helped me find the answer:

select ts.upper_limit, count(mbt.time) from (
    select '2017-04-05 00:00:00' as lower_limit, '2017-04-10 00:00:00' as upper_limit union 
    select '2017-04-10 00:00:00' as lower_limit, '2017-04-15 00:00:00' as upper_limit union 
    select '2017-04-15 00:00:00' as lower_limit, '2017-04-20 00:00:00' as upper_limit union 
    select '2017-04-20 00:00:00' as lower_limit, '2017-04-25 00:00:00' as upper_limit union 
    select '2017-04-25 00:00:00' as lower_limit, '2017-04-30 00:00:00' as upper_limit union 
    select '2017-04-30 00:00:00' as lower_limit, '2017-05-05 00:00:00' as upper_limit union 
    select '2017-05-05 00:00:00' as lower_limit, '2017-05-10 00:00:00' as upper_limit
) as ts
join my_big_table mbt on 
(
    mbt.time >  ts.lower_limit and 
    mbt.time <= ts.upper_limit
)
group by ts.upper_limit
order by ts.upper_limit;
Community
  • 1
  • 1
chaserb
  • 1,340
  • 1
  • 14
  • 25