What's the approach to joining these 2 tables?

Question

Say I have 2 tables A and B which contain information for start and end times respectively. The primary key is a combination of id and the timestamp. Thus, no 2 records can have the same id and timestamp

A

id | start time
1 | 2016-02-06 17:03
1 | 2016-03-09 18:09
2 | 2017-02-07 23:34
3 | 2016-02-07 19:12
3 | 2016-02-07 23:52
...

B

id | end time
1 | 2016-02-06 18:32
1 | 2016-03-09 21:11
2 | 2017-02-08 01:22
3 | 2016-02-07 21:32
3 | 2016-02-08 02:11
...

My end result should be something like

id | start time | end time
1 | 2016-02-06 17:03 | 2016-02-06 18:32
1 | 2016-03-09 18:09 | 2016-03-09 21:11
2 | 2017-02-07 23:34 | 2017-02-08 01:22
3 | 2016-02-07 19:12 | 2016-02-07 21:32
3 | 2016-02-07 23:52 | 2016-02-08 02:11
...

Obviously I can't join on just ID as the ids 1 and 3 each appear twice. I can't join on the day either as the 3rd and 5th records span across 2 different days. So is there a way to join these 2 tables? Any help would be much appreciated! Thanks!

What is the common relationship between the tables that you want to use to join them? — Barmar, Nov 19 '19 at 22:35
Why are the start and end times in different tables in the first place? — Barmar, Nov 19 '19 at 22:36
I'm confused why exactly you can't just join on ID. That wouldn't cause you to have duplicated records. — SchrodingersStat, Nov 19 '19 at 22:36
@SchrodingersStat You'll get a cross product between all the start and end times. — Barmar, Nov 19 '19 at 22:37
See https://stackoverflow.com/questions/34667070/update-a-mysql-table-with-record-rankings-within-groups for how to add a counter ordered by time within each `id` group in the two tables. Then you can join on `id` and `counter`. — Barmar, Nov 19 '19 at 22:39
Why is the fourth row shown for A `('3', 2016-02-07 19:12')` matched with the fourth row shown for B and not the fifth row ? There is probably example data that will show that just having start time before end time would not eliminate all overlap, and we would still have potential for cross product. As @Barmar suggests, I'd use inline view to assign a counter, from 1 to n for each `id` value, ordering by datetime; then join on combination of `id` and `counter`. — spencer7593, Nov 19 '19 at 22:40
It would also be better if the tables had a separate key that identifies the event uniquely, and can then be used to join the tables. — Barmar, Nov 19 '19 at 22:40
@SchrodingersStat If I join based just on ID, then each record with id 1 in A will be joined with each record with id 1 in B. Essentially we will have a record in the joint table with start time = March 9 2016 18:09 but end time = February 6 2016 18:32. In other words, it implies that the end time is before the start time — Dynastywarriorlord07, Nov 19 '19 at 22:41
@spencer7593 I think he wants to pair each start time with the closest following end time for the same ID. — Barmar, Nov 19 '19 at 22:43
@Dynastywarriorlord07 Ahh gotcha, I did not look at your table close enough. — SchrodingersStat, Nov 19 '19 at 22:44
If you're using MySQL 8, you can use a window function to assign rank within the groups. — Barmar, Nov 19 '19 at 22:44
What is the datatype of columns `start_time` and `end_time`: are they `datetime`s, or some string datatype? — GMB, Nov 19 '19 at 22:48
@Dynastywarriorlord07: is it possible that two occurences of the same `id` would overlap (ie, for the same id, the next start time occurs before the end time of the ent time of the current record)? — GMB, Nov 19 '19 at 23:06

score 1 · Answer 1 · answered Nov 19 '19 at 22:44

1

I agree with Barmar and encourage you to revisit your data model. I would expect start time and end time to be in the same table.

And while the existing ID may be for something like user_id, if that ID is duplicated in this table then there should be some other unique identifier, maybe transaction_id, that uniquely identifies each record.

answered Nov 19 '19 at 22:44

alexherm

1,362
2
18
31

you must track start and end time within the same row. – WEBjuju Nov 19 '19 at 22:50
I wouldn't say that is an absolute requirement but it certainly makes things a lot easier. – alexherm Nov 19 '19 at 23:04

LukStorms · Answer 2 · 2019-11-20T16:33:45.283

Since the id's are the same and the end date is higher than the start date.

If those times are strings then use STR_TO_DATE

SELECT a.id, a.`start time`, MIN(b.`end time`) AS `end time`
FROM A a
LEFT JOIN B b 
  ON b.id = a.id
 AND STR_TO_DATE(b.`end time`, '%Y-%m-%d %H:%i') > STR_TO_DATE(a.`start time`, '%Y-%m-%d %H:%i')
GROUP BY a.id, a.`start time`
ORDER BY a.id, a.`start time`;

If those are timestamps

SELECT a.id, a.`start time`, MIN(b.`end time`) AS `end time`
FROM A a
LEFT JOIN B b
  ON b.id = a.id
 AND b.`end time` > a.`start time`
GROUP BY a.id, a.`start time`
ORDER BY a.id, a.`start time`;

A test on rextester here

If there are many timestamps per B.id?
Then it might be more performant if the range is limited to a day or less.

SELECT a.id, a.`start time`, MIN(b.`end time`) AS `end time`
FROM A a
LEFT JOIN B b
  ON b.id = a.id
 AND b.`end time` > a.`start time` 
 AND b.`end time` < TIMESTAMPADD(HOUR,24,a.`start time`)
GROUP BY a.id, a.`start time`
ORDER BY a.id, a.`start time`;

Sure, but what if an ID has multiple entries for the same day? — alexherm, Nov 19 '19 at 23:16
Indeed. Thx. But now with a GROUP BY and a MIN it should be fine now. — LukStorms, Nov 19 '19 at 23:37
@Dynastywarriorlord07 Added an extra one. Since I haven't a clue about your real data. — LukStorms, Nov 20 '19 at 16:35

GMB · Answer 3 · 2019-11-19T23:35:14.233

Assuming that there are no overlaps between start/end times of the same id, you could join the tables, with a join condition based on a correlated subquery that ensures that the record of tableb that has the closest end_time after the current start_time of tablea is picked:

select
    a.*,
    b.end_time
from
    tablea a
    inner join tableb b
        on  b.id = a.id
        and b.end_time = (
            select min(b1.end_time)
            from tableb b1 
            where b1.id = a.id and b1.end_time > a.start_time
        )

Demo on DB Fiddle:

id | start_time       | end_time        
-: | :--------------- | :---------------
 1 | 2016-02-06 17:03 | 2016-02-06 18:32
 1 | 2016-03-09 18:09 | 2016-03-09 21:11
 2 | 2017-02-07 23:34 | 2017-02-08 01:22
 3 | 2016-02-07 19:12 | 2016-02-07 21:32
 3 | 2016-02-07 23:52 | 2016-02-08 02:11

What's the approach to joining these 2 tables?

3 Answers3