1

So I got a request for some data from someone in my company. This data request requires SQL query of multiple columns from multiple tables BUT can only be the most recent incident from a column from one of those tables. Here is the kicker...each table has a single column that it shares with another table. So to get this "report" I have to do it piece by piece by piece.

Here is how my initial query looked before I realized I only need the most recent update to one of the tables:

SELECT a.description  AS "Description", 
       a.pricing      AS "Price", 
       b.id           AS "ID", 
       c.descriptionb AS "DescriptionB", 
       c.date         AS "date", 
       d.descriptionc AS "DescriptionC" 
FROM   database.table1 a, 
       database.table2 b, 
       database.table3 c, 
       database.table4 d 
WHERE  a.description = b.descriptive_info 
       AND b.id = c.comp_id 
       AND c.descriptionb = d.long_description
       AND d.id_for_a = a.id
       AND a.company IN ( '000', '001', '002', '003', '004' ) 
       AND b.expdate >= Now()

I realized that the "c.date" above needs to only display the most recent date for each unique ID/DescriptionC.

Here is an example result of the initial query:

Description|Price   |ID       |DescriptionB   |date     |DescriptionC
---------------------------------------------------------------------
Computer   |300     |554      |5% Off         |3/2/2010 |Includes CPU
Computer   |300     |554      |5% Off         |3/2/2010 |Includes DOG
Computer   |300     |554      |5% Off         |3/2/2010 |Includes CAT 
Computer   |300     |554      |9% Off         |4/3/2011 |Includes CPU
Computer   |300     |554      |9% Off         |4/3/2011 |Includes DOG
Computer   |300     |554      |9% Off         |4/3/2011 |Includes CAT 
Computer   |300     |554      |7% Off         |9/1/2019 |Includes CPU
Computer   |300     |554      |7% Off         |9/1/2019 |Includes DOG
Computer   |300     |554      |7% Off         |9/1/2019 |Includes CAT 
Printer    |75      |801      |3% Off         |6/3/2012 |Includes DOS
Printer    |75      |801      |3% Off         |6/3/2012 |Includes PIG
Printer    |75      |801      |3% Off         |6/3/2012 |Includes RAT
Printer    |75      |801      |9% Off         |8/3/2013 |Includes DOS
Printer    |75      |801      |9% Off         |8/3/2013 |Includes PIG
Printer    |75      |801      |9% Off         |8/3/2013 |Includes RAT
Printer    |75      |801      |1% Off         |1/3/2019 |Includes DOS
Printer    |75      |801      |1% Off         |1/3/2019 |Includes PIG
Printer    |75      |801      |1% Off         |1/3/2019 |Includes RAT

Here is the result of the query by Laurenz below:

Description|Price   |ID       |DescriptionB   |date     |DescriptionC
---------------------------------------------------------------------
Computer   |300     |554      |7% Off         |9/1/2019 |Includes CAT

...close but not quite there yet.

Desired Result:

Description|Price   |ID       |DescriptionB   |date     |DescriptionC
---------------------------------------------------------------------
Computer   |300     |554      |7% Off         |9/1/2019 |Includes CPU
Computer   |300     |554      |7% Off         |9/1/2019 |Includes DOG
Computer   |300     |554      |7% Off         |9/1/2019 |Includes CAT 
Printer    |75      |801      |1% Off         |1/3/2019 |Includes DOS
Printer    |75      |801      |1% Off         |1/3/2019 |Includes PIG
Printer    |75      |801      |1% Off         |1/3/2019 |Includes RAT

As you can see there are multiple dates for the "same product", duplicate products with varying descriptions, etc. I basically only want the rows with the most recent "date" for each unique ID/DescriptionC. Hopefully this is a little easier to understand than my original post.

BTW these are simplified examples as I don't want to get in trouble by my company but the query and concepts are the same. If you can imagine multiple products with numerous instances of each product multiple times you can imagine how big the data set could get. I only care about the most recent instance of each unique ID/DescriptionC.

GMB
  • 216,147
  • 25
  • 84
  • 135
user2079024
  • 161
  • 2
  • 8
  • 1
    A few rows of data and the corresponding expected output would probably help us provide a more accurate answer. – GMB Sep 09 '19 at 22:43
  • So you need just the newest row of what groups? How do you see, which row is newer than another? Perhaps showing all fields of all tables might help – Islingre Sep 09 '19 at 22:44
  • Possible duplicate of [Select first row in each GROUP BY group?](https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group) – Clockwork-Muse Sep 09 '19 at 23:33

3 Answers3

4

I understand that, out of the results from the current query, you just want to select the one that has the maximum value on c.date.

One solution would be to turn the existing query to a subquery, and use ROW_NUMBER() to rank the records by descending c.date. Then, the outer query can just filter on the highest ranked record.

Query:

SELECT *
FROM (
    SELECT 
        a.description  AS "Description", 
        a.pricing      AS "Price", 
        b.string       AS "String", 
        c.description  AS "Description", 
        c.date         AS "date", 
        d.descriptionb AS "DescriptionB",
        ROW_NUMBER() OVER (ORDER BY c.date DESC) AS rn
    FROM
       database.table1 a 
       INNER JOIN database.table2 b ON a.id = b.table1_id 
       INNER JOIN database.table3 c ON b.element = c.table2_element AND b.expdate >= Now()
       INNER JOIN database.table4 d ON c.value = d.table3_value 
    WHERE 
        a.company IN ( '000', '001', '002', '003', '004' ) 
) x WHERE rn = 1;

PS:

  • always prefer explicit joins instead of old-school, implicit joins; I changed the initial query accordingly

  • if you need the maximum date by partition (which is not obvious to tell without seeing sample data), then you simply need to add a PARTITION BY clause to the ROW_NUMBER() function.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Awesome, that removed the error, however, now it doesn't like the second to last "WHERE" clause: ERROR: syntax error at or near "WHERE" Here is the a snippet of the ACTUAL much bigger query everything surrounding that "WHERE" statement: INNER JOIN viper_logical_1.promotion_relatedmarketitems i ON i.element = b.externalid WHERE a.externalid IN ('5957962','2882541') ) x WHERE rn = 1 LINE 35: WHERE – user2079024 Sep 10 '19 at 01:42
  • @user2079024: well it’s hard to tell what is wrong in your real query... One thing you could do is try to narrow down the problem: first run the subquery and see if it works, then run the whole query. – GMB Sep 10 '19 at 18:31
  • Okay...so this query is close but not quite there yet. I updated my original description of the issue, results, and desired results. Hopefully this will make things a bit more clear as to what I'm attempting to accomplish. Again, any additional help would be greatly appreciated! – user2079024 Sep 11 '19 at 07:58
2

The following sql can be used to solve the question:

SELECT *
FROM (
    SELECT 
        a.description  AS "Description",

        a.pricing      AS "Price", 
        b.string       AS "String", 
        c.description  AS "Description", 
        c.date         AS "date", 
        d.descriptionb AS "DescriptionB",
        ROW_NUMBER() OVER (ORDER BY c.date DESC PARTITION BY B.ID ) AS rn
    FROM
       database.table1 a 
       INNER JOIN database.table2 b ON a.id = b.table1_id 
       INNER JOIN database.table3 c ON b.element = c.table2_element AND b.expdate >= Now()
       INNER JOIN database.table4 d ON c.value = d.table3_value 
    WHERE 
        a.company IN ( '000', '001', '002', '003', '004' ) 
) x WHERE rn = 1;
MaartenDev
  • 5,631
  • 5
  • 21
  • 33
Hal McGee
  • 21
  • 1
1

Thank you so much everyone. The answer from Hal McGee was basically correct! I just had to tweak a few things. Here is an example of the full query I ended up using that functioned as I was needing it to:

SELECT *
FROM (
    SELECT 
        a.description  AS "Description",
        a.pricing      AS "Price", 
        b.string       AS "String", 
        c.description  AS "Description", 
        c.date         AS "date", 
        d.descriptionb AS "DescriptionB",
        ROW_NUMBER() OVER (PARTITION BY d.descriptionc, b.id ORDER BY c.date DESC ) AS rn
    FROM
       database.table1 a 
       INNER JOIN database.table2 b ON a.id = b.table1_id 
       INNER JOIN database.table3 c ON b.element = c.table2_element AND b.expdate >= Now()
       INNER JOIN database.table4 d ON c.value = d.table3_value 
    WHERE 
        a.company IN ( '000', '001', '002', '003', '004' ) 
) x WHERE rn = 1;
user2079024
  • 161
  • 2
  • 8