0

I'm trying to write a query that pulls multiple columns from two tables and then also looks to see if there is another instance containing the same kind of data within the last 28 days.

I've got the query working and it looks something like this:

SELECT
     a.col_a
    ,a.col_b
    ,a.col_c
    ,b.col_a
    ,b.col_b
    ,CASE WHEN EXSISTS (SELECT a_sub.col_a
                        FROM a_sub
                        INNER JOIN b_sub
                          ON a_sub.x = b_sub.x
                        WHERE
                          b.col_a = b_sub.col_a
                          AND b.col_b <> b_sub.col_b
                          AND a.date > a_sub.date
                          AND a.date <= DATEADD(d, 28, a_sub.date)
                          AND a.col_c = a_sub.col_c
                          AND (a.col_d IS NULL OR a.col_d <> 7)
                          AND (a_sub.col_d IS NULL OR a_sub.col_d <> 7)
                        ) THEN 'Yes'
                          ELSE 'No'
                          END AS IsRepeat28
FROM a
INNER JOIN b ON a.x = b.x

This isn't particularly quick, but nor is it too slow to be a problem. The issue is I need to repeat the above WHEN EXISTS subquery but for 21, 14, 7 and 1 day periods as well making 5 repetitions - the query time then goes from roughly 15 seconds to about 10 minutes.

I also need to adjust the dates so as well as flagging rows that are "IsRepeat" there is also a version for "HasRepeat" which is based on

AND a_sub.date > a.date AND a_sub.date <= DATEADD(d, 28, a.date)

instead of the original

AND a.date > a_sub.date AND a.date <= DATEADD(d, 28, a_sub.date)

This then means having 10 of these EXISTS subqueries and it takes almost 45 minutes.

My question is really twofold; is there a more efficient way of writing this subquery, and is there a better way of repeating it rather than having virtually the same code written out 10 times?

RickyTillson
  • 323
  • 1
  • 2
  • 10
  • 1
    If instead of checking for the existence of the date within the window you instead retrieve the previous matching date you can then use the one date to build flags for each duration. This should keep you down to one round trip. If you're on a modern version of SQL Server then you might be able to get this down to a single read of the table using the LAG window function. – BarneyL Mar 09 '20 at 11:09
  • Changing a parameter and the query takes longer to execute. This reminds me of my problems with SQL creating and reusing inefficient execution plans. I suggest you can try adding the OPTION(RECOMPILE) command to your code, maybe it helps: https://stackoverflow.com/questions/20864934/option-recompile-is-always-faster-why/32929267 - but please be careful, I'm a random stranger on SE and I suggest you something I don't even understang completely myself, so please take your research :) – Annosz Mar 09 '20 at 12:16
  • The place to start on performance issues is usually the _actual execution plan_. Please see [paste the plan](https://www.brentozar.com/pastetheplan/instructions/) for a way to include an execution plan in your question. What, if any, indexes exist? – HABO Mar 09 '20 at 14:22

1 Answers1

0

I call any query which produces data a problem. SQL should only be used to query data. You don't need execution plans if you understand your process and the work it does.

First, separate your main projection as a step 1, either into a temporary table or table variable depending on how many records you have. Caching each set you need in your production query allows you to gauge the load size as a select count(*).

    insert into @mytablevariable values
    SELECT
       a.col_a,
       a.col_b,
       a.col_c,
       b.col_a,
       b.col_b
       -- build this column later
    FROM a
    INNER JOIN b ON a.x = b.x

Next, cache the result of that nested query so the inner join isn't conducted every iteration.

    insert into @mysub values
    SELECT a_sub.col_a
    FROM a_sub
    INNER JOIN b_sub
       ON a_sub.x = b_sub.x
       AND b.col_b <> b_sub.col_b

We created waste deliberately so we can manage it by remodeling @mysub by your processing requirements. Our objective is to eliminate work by fetching the correct data in the correct order.

What stands out the most in this process is the WHERE clause that a date field is within a date range.

    AND a.date > a_sub.date AND a.date <= DATEADD(d, 28, a_sub.date)

Naturally tables are not indexed by date column values and therefore this WHERE comparison must conduct a full table scan, or what we call an Extractor. An extractor pulls a portion (subset) of data from a set. Because this extractor is in a subquery which is executed for each outer record result like any other nested process, the whole table is checked over and over. We always want to minimize extraction to one pass, which is only done when we finish organizing criteria on the extractor. Never nest extractors.

First, minimize the size of the @mysub extractor results by the range of date values needed in the outer @mytablevariable query.

    insert into @mysub values
    SELECT a_sub.col_a
    FROM a_sub
    INNER JOIN b_sub
       ON a_sub.x = b_sub.x
       AND b.col_b <> b_sub.col_b
       and a_sub.date <= DATEADD(d, 28, select min(a.date) from @mytablevariable)
       and a_sub.date > (select max(a.date) from @mytablevariable)

You can then project @mysub into your 21 day subset, and the smaller set again for the others. For the rest of the WHERE clause, expand @mysub with the columns you're comparing. To build your final product, select everything from the first table and conduct one lookup per record as follows.

    select 
        *,
        case 
            when b.col_a in (select col_a from @mysub28) then 'Yes'
            else 'No'
        end as IsRepeat28,
        case 
            when b.col_a in (select col_a from @mysub21) then 'Yes'
            else 'No'
        end as IsRepeat21
        -- rinse, repeat
    from 
    @mytablevariable

Any time you have a production process, separate out your components and work on your extractors' selection criteria so they only make one pass.

RBJ
  • 128
  • 6