I'm creating an open order report using SQL to query data from AWS Redshift.
My current table has duplicates (same order, ln, and subln numbers)
Order | Ln | SubLn | Qty | ShpDt |
---|---|---|---|---|
4166 | 010 | 00 | 3 | 2021-01-06 |
4166 | 010 | 00 | 3 | 2021-01-09 |
4167 | 011 | 00 | 9 | 2021-02-01 |
4167 | 011 | 00 | 9 | 2021-01-28 |
4167 | 011 | 01 | 8 | 2020-12-29 |
I need to remove duplicates using order, ln, and subln columns as group identifiers. I want to calculate SUM of qty and keep most recent ship date for the order to achieve this result:
Order | Ln | SubLn | TotQty | Shipped |
---|---|---|---|---|
4166 | 010 | 00 | 6 | 2021-01-09 |
4167 | 011 | 00 | 18 | 2021-02-01 |
4167 | 011 | 01 | 8 | 2020-12-29 |
After reading (How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?) I tried the code below, which only aggregated the fields and did not remove duplicates. What am I missing?
FROM table1 AS t1
JOIN (SELECT t1.order, t1.ln, t1.subln, SUM(qty) AS totqty, MAX(shpdt) AS shipped
FROM table1 AS t1
GROUP BY order, ln, subln) as t2
ON tb1.order = tb2.order AND tb1.ln = tb2.ln AND tb1.subln = tb2.subln