How can I removes duplicates by using MAX and SUM per group identifier?

Question

I'm creating an open order report using SQL to query data from AWS Redshift.

My current table has duplicates (same order, ln, and subln numbers)

Order	Ln	SubLn	Qty	ShpDt
4166	010	00	3	2021-01-06
4166	010	00	3	2021-01-09
4167	011	00	9	2021-02-01
4167	011	00	9	2021-01-28
4167	011	01	8	2020-12-29

I need to remove duplicates using order, ln, and subln columns as group identifiers. I want to calculate SUM of qty and keep most recent ship date for the order to achieve this result:

Order	Ln	SubLn	TotQty	Shipped
4166	010	00	6	2021-01-09
4167	011	00	18	2021-02-01
4167	011	01	8	2020-12-29

After reading (How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?) I tried the code below, which only aggregated the fields and did not remove duplicates. What am I missing?

FROM table1 AS t1
JOIN (SELECT t1.order, t1.ln, t1.subln, SUM(qty) AS totqty, MAX(shpdt) AS shipped
      FROM table1 AS t1
      GROUP BY order, ln, subln) as t2
ON tb1.order = tb2.order AND tb1.ln = tb2.ln AND tb1.subln = tb2.subln

i don't see duplicated rows in your result . (there are only some partial repeat value .. eg Ln 11 or .SubLn 00) what do you really mean with duplicate?? — ScaisEdge, Feb 09 '21 at 20:35
I can't see any duplicated either - all result rows have a different subln. Did you mean to GROUP BY subln? Did you mean to select it from table1 and include it in your join? — openshac, Feb 09 '21 at 20:36
Of course you are getting duplicates. Your derived table does the grouping and summing, shouldn't have dupes there. Then you join it back to the original table, which has the duplicates. — Andrew, Feb 09 '21 at 20:36
The report contains order numbers which consist of order, line, and sub line. In the original table, order 4166 line 10 is duplicated and order 4167 line 11 is duplicated. — Maysie Hewitt, Feb 09 '21 at 20:42
@Andrew What do you suggest? I tried that method after reading other posts about greatest-n-per-group. I'm new to sql so I greatly appreciate the help — Maysie Hewitt, Feb 09 '21 at 20:49
@MaysieHewitt - What are you trying to accomplish by joining back to the main table? If you just run the inner query, that seems to give you the desired result. — Andrew, Feb 09 '21 at 20:58
@Andrew I get "ERROR: column must appear in the GROUP BY clause or be used in an aggregate function" when I don't use the subquery — Maysie Hewitt, Feb 09 '21 at 21:31
@a_horse_with_no_name Thanks for pointing that out, I thought workbench/j was tagged already. I'm using workbench to access data in AWS redshift if that helps. — Maysie Hewitt, Feb 10 '21 at 13:23
The client tool (= SQL Workbench/J) is irrelevant to the question. — , Feb 10 '21 at 13:29

score 0 · Answer 1 · answered Feb 09 '21 at 20:45

I need to remove duplicates using order, ln, and subln columns as group identifiers. I want to calculate SUM of qty and keep most recent ship date for the order to achieve this result:

Based on the dataset resulted from the query, the dataset is unique on those 3 columns

How can I removes duplicates by using MAX and SUM per group identifier?

1 Answers1