Using DB2 SQL, how can I avoid returning rows that aren't the most recent, for a condition that is NOT part of my where clause?

Question

SCENARIO DATA:

Row   Col1    Col2    Col3         Col4    Col5
1     Bob     Smith   2000-01-01   John    Doe
2     Bob     Smith   2010-01-01   Jane    Jones
3     Ted     Jones   2005-01-01   Pete    Mills
4     Ted     Jones   2008-01-01   John    Doe

My SQL:

select col1, col2 from schema.table where col4='John' and col5='Doe'

I have no clue how to restrict my data as per need stated below.

I'm searching for where John Doe is in column 4/5 BUT I only want to retrieve the row if it's the current relationship (based on column 3) for the person in columns 1/2.

In the above data, I DO want the 4th row because John Doe is currently related to Ted Jones. But I do NOT want to retrieve the first row...because the John Doe row is NOT the current relation for Bob Smith. The current relation for Bob Smith is to Jane Jones.

IF I were searching for Pete Mills, I wouldn't want to find anything because Pete Mills is not the current relationship for Ted Jones.

Ugh. Sorry...the sample data scenario didn't post right. Looked fine when i entered it. Maybe paste to a text editor ? — Rich, Jan 18 '19 at 20:30
I fixed that up for you. In future, if you want to add code formatting to any text on Stack Overflow, just highlight the lines, and click the `{ }` icon in the text editor. — e_i_pi, Jan 18 '19 at 20:33
Possible duplicate of [Select first row in each GROUP BY group?](https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group) — Clockwork-Muse, Jan 18 '19 at 20:43
e_i_pi (e-i-e-i-o lol ), Clockwork .... thank you so much. Clockwork, my problem with not finding existing answer is mostly I didn't know how to properly phrase my question. Your patience is appreciated. Happy New Year all. — Rich, Jan 18 '19 at 20:59
@Rich - that's fine; that's _why_ "close as duplicate" exists. It says "yes, there is an answer, and the canonical version is over here". So the next person who phrases the question the same way you did gets pointed to the more "proper" version. If however, that answer doesn't enable you to solve your problem, then you need to add clarifying reasons why it doesn't fit. — Clockwork-Muse, Jan 22 '19 at 18:56

score 3 · Answer 1 · answered Jan 18 '19 at 20:42

You need to use something that DB2 call OLAP functions (other RDMS engines tend to call them window functions - both terms are valid). An OLAP function allows you to return analytic data about a row, e.g. what number it is in a sorted list.

For your query, you only want to return the first row for each person, so we use an OLAP function called ROW_NUMBER(). When we use this function, we want to partition over the Col1/Col2 data (i.e. we are grouping on that data) and then reverse sort on Col3 to get the latest result. After that, we want to pick the rows that have the top value in the OLAP function, like so:

SELECT *
FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY Col1, Col2 ORDER BY Col3 DESC) AS rn
    FROM schema.table
) AS ranked_scenarios
WHERE rn = 1

Using DB2 SQL, how can I avoid returning rows that aren't the most recent, for a condition that is NOT part of my where clause?

1 Answers1