Can I eliminate this subquery in SQL Server?

Question

I have a view where I'm trying to merge two sets of time-series data which are stored in different ways. Set D has a single value stored with every time point. Set R has values which are stored with only an effective date, and they remain in effect until superseded. Here's an example table structure:

Table D
---------+------------------+--------
D_series | D_time           | D_value
---------+------------------+--------
1        | 2012-01-01 00:00 | 4.52
1        | 2012-01-01 01:00 | 2.41
1        | 2012-01-01 02:00 | 5.98
1        | 2012-01-01 03:00 | 3.51
2        | 2012-01-01 00:00 | 4.54
2        | 2012-01-01 01:00 | 6.41
2        | 2012-01-01 02:00 | 5.28
2        | 2012-01-01 03:00 | 3.11
3        | 2012-01-01 00:00 | 4.22
3        | 2012-01-01 01:00 | 9.41
3        | 2012-01-01 02:00 | 3.98
3        | 2012-01-01 03:00 | 3.53

Table L
---------+---------
D_series | R_series
---------+---------
1        | 1
2        | 1
3        | 2

Table RV
---------+----------+--------
R_series | R_header | R_value
---------+----------+--------
1        | 1        | 5.23
1        | 2        | 2.98
2        | 1        | 1.35

Table RH
---------+-----------------
R_header | R_start
---------+-----------------
1        | 2012-01-01 00:00
2        | 2012-01-01 01:49
3        | 2012-01-01 02:10

I want the view to return all points in D_time alongside their corresponding D_value and whatever R_value is current:

---------+------------------+---------+--------
D_series | D_time           | D_value | R_value
---------+------------------+---------+--------
1        | 2012-01-01 00:00 | 4.52    | 5.23
1        | 2012-01-01 01:00 | 2.41    | 5.23
1        | 2012-01-01 02:00 | 5.98    | 2.98
1        | 2012-01-01 03:00 | 3.51    | 2.98
2        | 2012-01-01 00:00 | 4.54    | 5.23
2        | 2012-01-01 01:00 | 6.41    | 5.23
2        | 2012-01-01 02:00 | 5.28    | 2.98
2        | 2012-01-01 03:00 | 3.11    | 2.98
3        | 2012-01-01 00:00 | 4.22    | 1.35
3        | 2012-01-01 01:00 | 9.41    | 1.35
3        | 2012-01-01 02:00 | 3.98    | 1.35
3        | 2012-01-01 03:00 | 3.53    | 1.35

I know that I can do this if I make a subquery and join to it:

select D.D_series, D_time, D_value, RV1.R_value
from D
join L on L.D_series = D.D_series
join RV RV1 on RV1.R_series = L.R_series
join RH RH1 on RH1.R_header = RV1.R_header and RH1.R_start <= D.D_time
left join (
    select R_series, R_value, R_start
    from RV RV2
    join RH RH2 on RH2.R_header = RV2.R_header
) RZ on RZ.R_series = RV1.R_series and RZ.R_start > RH1.R_start
where RZ.R_start is null or RZ.R_start > D_time

But as I understand it, this subquery will fetch every record in RV and RH first, even if the view only involves a few R_series. Is there any way to eliminate this subquery and make it into a standard join?

Can you show the results you want so we don't have to reverse engineer your word problem? — Aaron Bertrand, Mar 08 '13 at 17:57
A sqlfiddle would be nice. Also, what makes you think that every record will be fetched? have you looked at the query plan? — fguchelaar, Mar 08 '13 at 18:01
In R, the effective date table, how is the concept "superseded" represented? Does a value have an ending-date? Or does a series value simply get associated with a new more recent date? I would try to build an inline view of R-series with start and end dates for each of the R-series involved, not for all R-series, and then join the time-series where the D_time is between the start/end datetimes of each of the relevant R-series in the inline view. — Tim, Mar 08 '13 at 18:13

score 0 · Answer 1 · edited May 23 '17 at 12:21

0

Have you thought about using a cross apply? They're good in situations "where there is no simple join condition".

See this answer: When should I use Cross Apply over Inner Join?

In your case, I think something along these lines might work:

select D.D_series, D_time, D_value, current_r_values.R_value
from D
join L on L.D_series = D.D_series
cross apply (
   select top 1 R_series, R_value, R_start
   from RV RV2
   join RH RH2 on RH2.R_header = RV2.R_header
   where RH2.R_Start<D.D_time
   and RV2.R_series = L.R_series
   order by R.R_Start DESC) current_r_values

I know it's not exactly what you were looking for, but I've often found that using a cross apply can be significantly faster than a joining on an inequality.

edited May 23 '17 at 12:21

Community

1
1

answered Mar 08 '13 at 18:23

Daniel Neal

4,165
2
20
34

Note: I don't have access to a SQL Server for testing at the moment, and I usually have to have a few goes at a query like this before it works exactly right. The gist of it is trying to get, for each D_time, the R_Value with the latest time that is still earlier than the given D_time. Man, expressing these things in words is difficult... – Daniel Neal Mar 08 '13 at 18:26
According to the execution plan, the original query was fetching every row in `RH` twice (650 rows). This one seems to be fetching 61250 rows from `RH`. How is this happening? – Chel Mar 08 '13 at 18:50
Damn! I knew I shouldn't have answered while away from a sql terminal. Sorry! I'm most likely missing a where condition on the cross apply... – Daniel Neal Mar 08 '13 at 19:14
Is it giving the right answers just slowly? Are there indexes on the timestamp columns? – Daniel Neal Mar 08 '13 at 19:38
It's giving the right answers, yes, but in practice it takes just as long as the subquery. Table `D` has a clustered index on `D_series, D_time`; table `RV` has a clustered index on `R_series`; and table `RH` has separate non-clustered indexes on `R_header` and `R_start`. – Chel Mar 09 '13 at 00:33

Can I eliminate this subquery in SQL Server?

1 Answers1