SQL query to SUM sales but only SUM most recent on hand for a given GROUP

Question

I'm having some issues figuring out how GROUP a table in various ways SUMing the numbers of units sold, but only SUMing the most recent on hand units for each item within each GROUP.

Here's an example dataset: http://www.sqlfiddle.com/#!2/3ff18/1

I need to be able to perform GROUP BYs in such a way that the On Hand column is SUMed for only most recent item(s) within each group.

I've had some progress with a "self join" using MAX(date), but I'm not getting the desired results when using various GROUP BYs.

Here's some example outputs I'd like to see given the sqlfiddle.com dataset:

Category     Sold  On Hand
Electronics   500        0
Books         500        0
Other           0      100

Quarter  Category     Sold  On Hand
Q1       Electronics   400      100
Q1       Books         400      100
Q1       Other           0      100
Q2       Electronics   100        0
Q2       Books         100        0
Q2       Other           0      100

Month    Sold  On Hand 
January   300      800 
February  100      700 
March     200      500 
April     200      300 
May         0      300 <- This May entry isn't strickly necessary, but it would be nice
June      100      200 
July      100      100 <- This 100 units On Hand is from Item 987 that hasn't been sold

One area where the MAX(date) approach is tripping me up is with GROUP BY month. If you look in the table just above you'll note that I'd like to see 100 units On Hand in July... which is to say all units have been sold except for the item 987 which was added in January, but has not sold.

A couple of notes:

This is using MySQL but it'd be willing to try PostgreSQL if it has windowing functions that assist with this.
The performance of the solution is fairly vital given there are currently 1.5 million records. And will likely see millions more added.

score 1 · Answer 1 · edited May 23 '17 at 11:49

In Postgres, you have a variety of window functions to chose from.

You have DISTINCT ON to pick a row with the greatest n in a column per group:
Select first row in each GROUP BY group?

And you have functions for date / time algebra and formatting (most of which you have in MySQL as well). So there is no point in storing month and quarter redundantly. Just bloats your table and slows you down. I adapted your table layout accordingly. See the fiddle below.

Working with this accordingly simplified Postgres table:

CREATE TABLE tbl (
   item int
  ,on_hand int
  ,sold int
  ,thedate date
  ,category text
);

Demo EXTRACT() & to_char():

SELECT EXTRACT(quarter FROM thedate)::int AS quarter_int
     , EXTRACT(month   FROM thedate)::int AS month_int
     , to_char(thedate, '"Q"Q')  AS quarter_text
     , to_char(thedate, 'Month') AS month_text
FROM   tbl
LIMIT 1;

Get only the most recent row for each (item, month):

SELECT DISTINCT ON (item, date_trunc('month', thedate))
       *
FROM   tbl
ORDER  BY item, date_trunc('month', thedate), thedate DESC;

Total count per category:

SELECT category, sum(sold) AS sold, min(on_hand) AS on_hand
FROM  (
   SELECT category, sold
        , first_value(on_hand) OVER (PARTITION BY item
                                     ORDER BY thedate DESC) AS on_hand
   FROM tbl
   ) sub
GROUP  BY 1
ORDER  BY 1;

The same for category and month:

SELECT category, to_char(month, 'YYYY-Mon') AS month
     , sum(sold) AS sold, min(on_hand) AS on_hand
FROM  (
   SELECT category, date_trunc('month', thedate) AS month, sold
        , first_value(on_hand) OVER (PARTITION BY item, date_trunc('month', thedate)
                                     ORDER BY thedate DESC) AS on_hand
   FROM tbl
   ) sub
GROUP  BY 1, sub.month
ORDER  BY 1, sub.month;

SQL Fiddle demo.

SQL query to SUM sales but only SUM most recent on hand for a given GROUP

1 Answers1