How can I join two tables with a many-to-one relationship ordered by date?

Question

I need to join two tables and get the most recent record only. Here is the basic form:

table1.id | table1.region | table1.important_col1
1         | NORTH AMERICA | abc
2         | CHINA         | def
2         | NORTH AMERICA | hij

table2.id | table2.region | table2.transaction_date | table2.important_col2
1         | NORTH AMERICA | 2/13/2019               | xyz
1         | NORTH AMERICA | 1/13/2019               | zzz
1         | NORTH AMERICA | 12/13/2018              | xxx

desired result:

1 | NORTH AMERICA | 2/13/2019 | abc | xyz

I wanted to use this answer but it seems like I can't use it if I need to group by and then order by descending date. I will need information in multiple columns on the right hand side, but do not want duplicate rows on the left hand side.

The right hand side may have up to 100s of records per id, but I just need something that works for now. Thanks in advance.

edit: I also need to filter the right hand side on other criteria so a simple MAX(table2.transaction_date) won't work.

Subquery and CTE. Not sure if it's an indexing issue but the last query I tried returned nothing. I might not be applying a WHERE condition properly. — st_9, Feb 13 '19 at 22:30
This is not clear--eg "join two tables and get the most recent record only" & "filter the right hand side on other criteria". When it's clear it will be a faq. You would likely find it if, as you should before considering posting any question, you googled 'site:stackoverflow.com relational algebra' with a clear concise precise statement of your question/problem/goal with & without your particular strings/names. There is even a tag: [tag:greatest-n-per-group]. PS See [ask] & hits googling 'stackexchange homework'--show relevant parts you can do. Also for code questions give a [mcve]. — philipxy, Feb 14 '19 at 02:44
I tried this but wasn't getting any records from the examples I found. In hindsight, the id # I arbitrarily chose as a test query did not have any transaction records (about 1% of IDs are like this...) The answer I marked as correct works as intended and is also efficient enough that I can check all IDs fairly quickly. Thanks for the tips! — st_9, Feb 14 '19 at 19:26

UV. · Accepted Answer · 2019-02-14T04:23:46.390

You can filter your table using internal window function, I used LAG for this example, but you can use ROW_NUMBER and filter several records. Using sliding windows does not change the number of records or counted as SQL aggregation, i.e. you filter using where rather than with having.

SELECT 
     t1.id
    ,t2.transaction_date
    ,t1.region 
    ,t1.col1
    ,t2.important_col2
FROM table1 AS t1
OUTER APPLY (
    SELECT 
         id
        ,transaction_date
        ,LAG(transaction_date,1) over  (partition by id order by transaction_date desc) as prev_td
        ,important_col2
        FROM table2
        -- WHERE filter_by_col=1  -- additonal "right side" filtering
) as t2
where t1.id = t2.id
and t2.prev_td is null

Output:

1   2019-02-13 00:00:00.000 NORTH AMERICA   abc xyz

I used this to test the above query:

create table table1
    (id int,
    region varchar(30),
    col1 varchar(100));

insert into table1
values (1 ,'NORTH AMERICA' ,'abc'),
(2,'CHINA','def'),
(2,'NORTH AMERICA','hij');

create table table2
    (id int,
    region varchar(30),
    transaction_date datetime,
    important_col2 varchar(100))

insert into table2
values
    (1 ,'NORTH AMERICA',convert(datetime, '02/13/19', 1),'xyz'),
    (1 ,'NORTH AMERICA',convert(datetime, '01/13/19',1),'zzz'),
    (1 ,'NORTH AMERICA',convert(datetime, '12/13/18',1),'xxx')

Derviş Kayımbaşıoğlu · Answer 2 · 2019-02-13T21:59:43.177

0

Try in this way:

select table11.id, table1.region, max(table2.transaction_date) transaction_date
from table1
inner join table2
  on table1.id = table2.id
group by table1.id, table1.region

edited Feb 13 '19 at 21:59

answered Feb 13 '19 at 21:51

Derviş Kayımbaşıoğlu

28,492
4
50
72

You might want to fix your typo :) – Tony Feb 13 '19 at 21:55
1

Thank you. I fixed it. however please keep in mind that you can always edit typos and send it for review. – Derviş Kayımbaşıoğlu Feb 13 '19 at 22:00
This answer looks good and concise, however I should add that I need to do additional filtering on the right hand side to remove unwanted rows (type of transaction, etc). So the max won't work in every case, sometimes the record I want will be the 3rd or 4th most recent. Thanks for replying – st_9 Feb 13 '19 at 22:26

score 0 · Answer 3 · answered Feb 13 '19 at 22:09

If there are more columns in table2 (other than transaction date) that you want to display as well, then aggregation alone cannot solve your question.

In MySQL 8.0 you can use window function ROW_NUMBER() to identify the most recent transaction record, as follows :

SELECT x.*
FROM (
    SELECT 
        t1.*,
        t2.*, 
        ROW_NUMBER() OVER(PARTITION BY t2.region ORDER BY t2.transaction_date DESC) rn
    FROM table1 t1
    INNER JOIN table2 t2 ON t1.region = t2.region
) x
WHERE x.rn = 1

In earlier versions of MySQL, one solution is to add a NOT EXISTS with a correlated subquery that ensures that we are joining with the most recent transaction for the current region :

SELECT t1.*, t2.*
FROM table1 t1
INNER JOIN table2 t2
    ON t1.region = t2.region
    AND NOT EXISTS (
        SELECT 1
        FROM table2
        WHERE region = t2.region AND transaction_date > t2.transaction_date
    )

How can I join two tables with a many-to-one relationship ordered by date?

3 Answers3