5

I am trying to utilize the MAX() OVER PARTITION BY function to evaluate the most recent receipt for a specific part that my company has bought. Below is an example table of the information for a few parts from the last year:

| VEND_NUM | VEND_NAME    | RECEIPT_NUM | RECEIPT_ITEM | RECEIPT_DATE |
|----------|--------------|-------------|----------|--------------|
| 100      | SmallTech    | 2001        | 5844HAJ  | 11/22/2017   |
| 100      | SmallTech    | 3188        | 5521LRO  | 12/31/2017   |
| 200      | RealSolution | 5109        | 8715JUI  | 05/01/2017   |
| 100      | SmallTech    | 3232        | 8715JUI  | 11/01/2017   |
| 200      | RealSolution | 2101        | 4715TEN  | 01/01/2017   |

As you can see, the third and fourth row show two different vendors for the SAME part number.

Here is my current query:

WITH

-- various other subqueries above...

    AllData AS
    (
        SELECT VEND_NUM, VEND_NAME, RECEIPT_NUM, RECEIPT_ITEM, RECEIPT_DATE
        FROM tblVend
            INNER JOIN tblReceipt ON VEND_NUM = RECEIPT_VEND_NUM
        WHERE
            VEND_NUM = '100' OR VEND_NUM = '200' AND RECEIPT_DATE >= '01-Jan-2017'
    ),

    SELECT MAX(RECEIPT_DATE) OVER PARTITION BY(RECEIPT_ITEM) AS "Recent Date", RECEIPT_ITEM
    FROM AllData

My return set looks like:

| Recent Date | RECEIPT_ITEM |
|-------------|--------------|
| 11/22/2017  | 5844HAJ      |
| 12/31/2017  | 5521LRO      |
| 11/01/2017  | 8715JUI      |
| 11/01/2017  | 8715JUI      |
| 01/01/2017  | 4715TEN      |

However, it should look like this:

| Recent Date | RECEIPT_ITEM |
|-------------|--------------|
| 11/22/2017  | 5844HAJ      |
| 12/31/2017  | 5521LRO      |
| 11/01/2017  | 8715JUI      |
| 01/01/2017  | 4715TEN      |

Can anybody please offer advice as to what I'm doing wrong? It looks like it is simply replacing the most recent date, not giving me just the row I want that is most recent.

Ultimately, I would like for my table to look like this. However, I don't know how to use the MAX() or MAX() OVER PARTITION BY() functions properly to allow for this:

| VEND_NUM | VEND_NAME    | RECEIPT_NUM | RECEIPT_ITEM | RECEIPT_DATE |
|----------|--------------|-------------|----------|--------------|
| 100      | SmallTech    | 2001        | 5844HAJ  | 11/22/2017   |
| 100      | SmallTech    | 3188        | 5521LRO  | 12/31/2017   |
| 100      | SmallTech    | 3232        | 8715JUI  | 11/01/2017   |
| 200      | RealSolution | 2101        | 4715TEN  | 01/01/2017   |
Sanushi Salgado
  • 1,195
  • 1
  • 11
  • 18
artemis
  • 6,857
  • 11
  • 46
  • 99
  • maybe use `partition by trim(receipt_item)` .. there might have been an extra space(s) in that column. – Vamsi Prabhala Mar 22 '18 at 18:41
  • Hi @VamsiPrabhala, thanks a lot for the suggestion. Unfortunately, same thing...I don't know why my MAX() isn't working. – artemis Mar 22 '18 at 18:42
  • Sounds like you want to filter out all but the latest receipt for each part? If so, `max()` is not necessarily the way to do it. – William Robertson Mar 22 '18 at 19:14
  • @WilliamRobertson correct William, sorry for poor wording. I have been researching how to approach this problem for awhile, and I mistakenly thought max was my best approach. – artemis Mar 22 '18 at 19:16

4 Answers4

6

Use window function ROW_NUMBER() OVER (PARTITION BY receipt_item ORDER BY receipt_date DESC) to assign a sequence number to each row. The row with the most recent receipt_date for a receipt_item will be numbered as 1.

WITH
-- various other subqueries above...

    AllData AS
    (
        SELECT VEND_NUM, VEND_NAME, RECEIPT_NUM, RECEIPT_ITEM, RECEIPT_DATE,
        ROW_NUMBER() OVER (PARTITION BY RECEIPT_ITEM ORDER BY RECEIPT_DATE DESC ) AS RN
        FROM tblVend
            INNER JOIN tblReceipt ON VEND_NUM = RECEIPT_VEND_NUM
        WHERE
            VEND_NUM IN ( '100','200')  AND RECEIPT_DATE >= '01-Jan-2017'
    )
   SELECT VEND_NUM, VEND_NAME, RECEIPT_NUM, RECEIPT_ITEM, RECEIPT_DATE
   FROM AllData WHERE RN = 1
kc2018
  • 1,440
  • 1
  • 8
  • 8
  • thank you very much for the explanation and the code. This does work. However, I believe I sold myself short in my question and I accept error for that. I would like to also, ultimately, grab the "RECEIPT_NUM" for which the "RECEIPT_ITEM" came in on. Is that possible to be added? – artemis Mar 22 '18 at 18:53
  • I've edited my question to reflect my ultimate end goal – artemis Mar 22 '18 at 18:56
  • Hi, @kc2018. Thanks for your edit. I keep getting "ORDER_BY not found in window function" errors, and I'm trying to debug now. It has to do with the "t' subquery – artemis Mar 22 '18 at 19:17
  • Do you have to use cte? – kc2018 Mar 22 '18 at 19:31
  • Frankly, and I'm sorry, I don't know what cte is. But what I can tell you is that I have no power to edit anything administrative on the DB side. I have to work with just being able to query and read from the database. – artemis Mar 22 '18 at 19:32
  • I also got a missing right parenthesis error before I edited your suggestion, just as an FYI – artemis Mar 22 '18 at 19:33
  • CTE = common table expression; it's MS SQL's made up term for creating subqueries with the `WITH` expression. – David Faber Mar 22 '18 at 19:42
  • @kc2018 yes, I need to use CTE. – artemis Mar 22 '18 at 19:49
  • Moved the row_number() into the CTE. Corrected the syntax error on missing parenthesis. Also removed the comma after the closing ) close to the end. Give it a try? – kc2018 Mar 22 '18 at 20:05
  • This answer could do with some explanation in order to be useful to others. Just pasting modified SQL doesn't aid understanding or make it obviously applicable to other areas. Add some explanatory detail and I'll upvote. – BriteSponge Mar 23 '18 at 11:09
4

I see a couple of issues here. One, the syntax for using the aggregate function MAX() as an analytic function (which is what Oracle helpfully calls a window function) looks like this:

MAX(receipt_date) OVER ( PARTITION BY receipt_item )

(note the position of the parentheses). Second, from your desired result set, you don't actually want a window function, you want to aggregate. A window (or analytic) function will always return a row for each row in its partition; that's just the way it works. So I think what you want is this:

WITH
-- various other subqueries above...
AllData AS
(
    SELECT VEND_NUM, VEND_NAME, RECEIPT_NUM, RECEIPT_ITEM, RECEIPT_DATE
      FROM tblVend
     INNER JOIN tblReceipt ON VEND_NUM = RECEIPT_VEND_NUM
     WHERE ( VEND_NUM = '100' OR VEND_NUM = '200' ) AND RECEIPT_DATE >= DATE'2017-01-01'
)
SELECT VEND_NUM, VEND_NAME, RECEIPT_NUM, RECEIPT_ITEM, MAX(RECEIPT_DATE)
  FROM AllData
 GROUP BY VEND_NUM, VEND_NAME, RECEIPT_NUM, RECEIPT_ITEM;

Now I made some small changes to the above, such as wrapping parentheses around the OR conditions (using IN ('100','200') might be even better) since AND takes precedence over OR (so your query would have gotten results where VEND_NUM = '100' OR ( VEND_NUM = '200' RECEIPT_DATE >= DATE'2017-01-01' ) ... but maybe that's what you wanted?).

David Faber
  • 12,277
  • 2
  • 29
  • 40
  • Hi, @DavidFaber. Thanks a lot for your answer, your inputs, and your explanations. Obviously I'm still a rookie but I am trying to learn. Your solution is akin to the one that kc had originally posted below. However, if ultimately I want to get the final table listed at the end of my question, how can I edit your example? – artemis Mar 22 '18 at 19:49
  • I made an update to my answer but I have to caution that I don't have a good sense of your data. – David Faber Mar 22 '18 at 20:09
  • Hey, @DavidFaber. I appreciate your input. Unfortunately, that did not yield the results I am looking for. I did not anticipate this problem being so hard. – artemis Mar 28 '18 at 14:52
1

Just passing by but I think you have to format the date to a 'YYYY-MM-DD' format so that it doesn't consider the "time".

Shinya
  • 11
  • 1
0

THIS ANSWERS THE ORIGINAL VERSION OF THE QUESTION.

Your where clause should probably look like this:

 WHERE VEND_NUM IN ('100', '200') AND RECEIPT_DATE >= DATE '2017-01-01'

It is quite possible that what you want is simply:

SELECT DISTINCT RECEIPT_DATE, RECEIPT_ITEM
FROM tblVend INNER JOIN
     tblReceipt
     ON VEND_NUM = RECEIPT_VEND_NUM
WHERE VEND_NUM IN ('100', '200') AND RECEIPT_DATE >= DATE '2017-01-01';

At the very least, this returns what you want to return.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I want to say thank you very much for the explanation and the code. This does work in the situation described above. However, I believe I sold myself short in my question and I accept error for that. I would like to also, ultimately, grab the "RECEIPT_NUM" for which the "RECEIPT_ITEM" came in on. Is that possible to be added? That way, I can use the "RECEIPT_NUM" in later subqueries to add in more data. – artemis Mar 22 '18 at 18:54
  • I've edited my question to reflect the intended end goal. – artemis Mar 22 '18 at 18:56