0

I'm creating an open order report using SQL to query data from AWS Redshift.

My current table has duplicates (same order, ln, and subln numbers)

Order Ln SubLn Qty ShpDt
4166 010 00 3 2021-01-06
4166 010 00 3 2021-01-09
4167 011 00 9 2021-02-01
4167 011 00 9 2021-01-28
4167 011 01 8 2020-12-29

I need to remove duplicates using order, ln, and subln columns as group identifiers. I want to calculate SUM of qty and keep most recent ship date for the order to achieve this result:

Order Ln SubLn TotQty Shipped
4166 010 00 6 2021-01-09
4167 011 00 18 2021-02-01
4167 011 01 8 2020-12-29

After reading (How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?) I tried the code below, which only aggregated the fields and did not remove duplicates. What am I missing?

FROM table1 AS t1
JOIN (SELECT t1.order, t1.ln, t1.subln, SUM(qty) AS totqty, MAX(shpdt) AS shipped
      FROM table1 AS t1
      GROUP BY order, ln, subln) as t2
ON tb1.order = tb2.order AND tb1.ln = tb2.ln AND tb1.subln = tb2.subln

     
  • 1
    i don't see duplicated rows in your result . (there are only some partial repeat value .. eg Ln 11 or .SubLn 00) what do you really mean with duplicate?? – ScaisEdge Feb 09 '21 at 20:35
  • I can't see any duplicated either - all result rows have a different subln. Did you mean to GROUP BY subln? Did you mean to select it from table1 and include it in your join? – openshac Feb 09 '21 at 20:36
  • Of course you are getting duplicates. Your derived table does the grouping and summing, shouldn't have dupes there. Then you join it back to the original table, which has the duplicates. – Andrew Feb 09 '21 at 20:36
  • The report contains order numbers which consist of order, line, and sub line. In the original table, order 4166 line 10 is duplicated and order 4167 line 11 is duplicated. – Maysie Hewitt Feb 09 '21 at 20:42
  • @Andrew What do you suggest? I tried that method after reading other posts about greatest-n-per-group. I'm new to sql so I greatly appreciate the help – Maysie Hewitt Feb 09 '21 at 20:49
  • @MaysieHewitt - What are you trying to accomplish by joining back to the main table? If you just run the inner query, that seems to give you the desired result. – Andrew Feb 09 '21 at 20:58
  • @Andrew I get "ERROR: column must appear in the GROUP BY clause or be used in an aggregate function" when I don't use the subquery – Maysie Hewitt Feb 09 '21 at 21:31
  • @a_horse_with_no_name Thanks for pointing that out, I thought workbench/j was tagged already. I'm using workbench to access data in AWS redshift if that helps. – Maysie Hewitt Feb 10 '21 at 13:23
  • The client tool (= SQL Workbench/J) is irrelevant to the question. –  Feb 10 '21 at 13:29
  • @a_horse_with_no_name got it – Maysie Hewitt Feb 10 '21 at 13:35

1 Answers1

0

I need to remove duplicates using order, ln, and subln columns as group identifiers. I want to calculate SUM of qty and keep most recent ship date for the order to achieve this result:

Based on the dataset resulted from the query, the dataset is unique on those 3 columns

enter image description here