How can I add the most recent line item in a group by with aggregates?

Question

Hey stackoverflow community,

I have a table of Sales, hypothetical shown below.

Customer       Revenue       State      Date
David          $100          NY         2016-01-01
David          $500          NJ         2016-01-03
Fred           $200          CA         2016-01-01
Fred           $200          CA         2016-01-02

I'm writing a simple query of revenue generated by customer. The output returns as such:

David     $600
Fred      $400

What I want to do now is add the row for the latest purchase date along with the state associated with the latest purchase.

Desired result:

David     $600      2016-01-03        NJ
Fred      $400      2016-01-02        CA

I would like to keep the SQL code as clean as possible. I also want to avoid doing a JOIN to a new query as this query can start to get complex. Any ideas as to how to do so?

what if a customer has 2 transactions in different states on the same latest date? — Vamsi Prabhala, Jan 21 '16 at 19:31
@vkp sorry, my actual use case will be unique timestamps, more granular than dates. so in the case when `Dave` purchases two items at the same time, he will always purchase them from the same location. — mlh351, Jan 21 '16 at 19:43
Possible duplicate of [Select first row in each GROUP BY group?](http://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group) — Tab Alleman, Jan 21 '16 at 19:55

score 1 · Answer 1 · answered Jan 21 '16 at 19:33

You can do this using row_number() (or first_value()) and conditional aggregation:

select customer, sum(revenue), max(date),
       max(case when seqnum = 1 then state end) as mostRecentState
from (select s.*,
             row_number() over (partition by customer order by date desc) as seqnum
      from s
     ) s
group by customer;

score 0 · Answer 2 · answered Jan 21 '16 at 19:54

Hy, try this:

SELECT
   S.CUSTOMER
  ,S.TOTAL
  ,M.DT
  ,M.STATE
FROM
(
  --SUM
  SELECT
     CUSTOMER
    ,SUM(REVENUE) AS TOTAL
  FROM TB
  GROUP BY
    CUSTOMER
) S 
INNER JOIN (
        --MAX DATE
        SELECT
          CUSTOMER
          ,STATE
          ,DT
       FROM TB
       WHERE (CUSTOMER, DT) IN (
                              SELECT AUX.CUSTOMER ,MAX(AUX.DT)  
                              FROM TB AUX 
                              GROUP BY AUX.CUSTOMER
                            )
) M ON (S.CUSTOMER = M.CUSTOMER );

score 0 · Answer 3 · answered Jan 21 '16 at 21:00

0

SELECT Customer
  , (SELECT SUM(Revenue) FROM #t WHERE Customer = xx.Customer ) AS TotalRevenue
  , Dt, STATE
FROM #t xx
WHERE Dt = (SELECT MAX(Dt) FROM #t WHERE Customer = xx.Customer)
ORDER BY Customer

answered Jan 21 '16 at 21:00

FLICKER

6,439
4
45
75

score 0 · Answer 4 · answered Jan 21 '16 at 21:13

On large data sets, I would avoid using the ROW_NUMBER() function due to performance issues. Here is a sample of how I've done this in the past with good performance results:

SELECT DISTINCT
    t.customer,
    totals.totalRevenue,
    t.state,
    t.date
FROM
    @test AS t INNER JOIN (
        SELECT
            customer,
            SUM(revenue) AS totalRevenue,
            MAX(date) AS maxDate
        FROM
            @test AS tSub
        GROUP BY
            customer
    ) AS totals ON t.customer = totals.customer AND t.date = totals.maxDate

If the customer and date fields are unique per record, you can remove the DISTINCT clause at the top.

Try SET STATISTICS IO ON and SET STATISTICS TIME ON before executing your query--less logical disk IO generally means better performance.

How can I add the most recent line item in a group by with aggregates?

4 Answers4