2

This query works, but is there a better way to write this query? The current one seems slow. The scenario is very straight forward.

I have two tables, Customers and Payments. The Customers table has what you would expect with Customer info. The Payments table keeps track of the monthly payments that a Customer makes. It has a few fields we need to look at - DueDate, PaymentDate, and CustomerID.

The query I want is I want all Customers who were late by at least 3 months in their first 12 payments. The query I have is below, but it seems to be pretty slow. Is there a better way to write this, than what I have below?

SELECT  CustomerID
FROM    Customers AS C
WHERE   EXISTS ( SELECT DueDate, CustomerID, PaymentDate
                 FROM   ( SELECT TOP 12 *
                          FROM      Payments as P
                          WHERE     P.CustomerID = C.CustomerID
                          ORDER BY PaymentDate
                        ) AS First12Payments
                 WHERE  DATEDIFF(MONTH, First12Payments.DueDate, First12Payments.PaymentDate) > 3 )

Thanks!

Rounder
  • 271
  • 1
  • 6
  • 2
    One easy optimization would be to add a new "MonthsLate" column on the payments table, which is pre-computed. Then you could filter directly on that column rather than on a computation. – Joe Enos Mar 23 '15 at 14:57
  • 2
    @JoeEnos Which would fly if you then indexed the MonthsLate column. It could even be a filtered index `where MonthsLate is not null` so that customers who never pay late won't result in the maintenance overhead. – Brandon Mar 23 '15 at 14:59

1 Answers1

3

Well, the suggestions in the comment by Joe Enos and Brandon are great. However, if you can't add that column there 2 minor changes to your SQL statement that will probably make it a little bit faster. to make it better you will probably need to add indexes to columns DueDate and PaymentDate in Payments table.

SELECT  CustomerID
FROM    Customers AS C
WHERE   EXISTS ( SELECT 1 -- no need for a columns list since you only check for existance
    FROM (SELECT TOP 12 DueDate, PaymentDate -- no need for all the columns, only the ones you use
          FROM      Payments as P
          WHERE     P.CustomerID = C.CustomerID
          ORDER BY PaymentDate
         ) AS First12Payments
        WHERE  DATEDIFF(MONTH, First12Payments.DueDate, First12Payments.PaymentDate) > 3 
Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
  • 1
    `SELECT 1` instead of selecting columns in the exists, although nicer to look at, won't affect performance. SQL Server is smart and knows it's being used for an `EXISTS` and doesn't actually return any data. See http://stackoverflow.com/a/1597487/1630665. Minimizing the columns for the inner select should help, though. – Dave Zych Mar 23 '15 at 15:23
  • @DaveZych Thanks, I didn't know that. it's an advice I've got from a DBA a few years ago, I guess Microsoft has improved SQL Server since then :-) – Zohar Peled Mar 23 '15 at 15:29
  • Thanks @DaveZych and Zohar, I did not know that about the EXISTS keyword either. The 1/0 thing in the example is pretty awesome. – Rounder Mar 23 '15 at 18:47