0

Say I have 2 tables A and B which contain information for start and end times respectively. The primary key is a combination of id and the timestamp. Thus, no 2 records can have the same id and timestamp

A

id | start time
1 | 2016-02-06 17:03
1 | 2016-03-09 18:09
2 | 2017-02-07 23:34
3 | 2016-02-07 19:12
3 | 2016-02-07 23:52
...

B

id | end time
1 | 2016-02-06 18:32
1 | 2016-03-09 21:11
2 | 2017-02-08 01:22
3 | 2016-02-07 21:32
3 | 2016-02-08 02:11
...

My end result should be something like

id | start time | end time
1 | 2016-02-06 17:03 | 2016-02-06 18:32
1 | 2016-03-09 18:09 | 2016-03-09 21:11
2 | 2017-02-07 23:34 | 2017-02-08 01:22
3 | 2016-02-07 19:12 | 2016-02-07 21:32
3 | 2016-02-07 23:52 | 2016-02-08 02:11
...

Obviously I can't join on just ID as the ids 1 and 3 each appear twice. I can't join on the day either as the 3rd and 5th records span across 2 different days. So is there a way to join these 2 tables? Any help would be much appreciated! Thanks!

  • Edit your question and show the results that you want. – Gordon Linoff Nov 19 '19 at 22:34
  • 1
    What is the common relationship between the tables that you want to use to join them? – Barmar Nov 19 '19 at 22:35
  • 1
    Why are the start and end times in different tables in the first place? – Barmar Nov 19 '19 at 22:36
  • I'm confused why exactly you can't just join on ID. That wouldn't cause you to have duplicated records. – SchrodingersStat Nov 19 '19 at 22:36
  • 1
    @SchrodingersStat You'll get a cross product between all the start and end times. – Barmar Nov 19 '19 at 22:37
  • See https://stackoverflow.com/questions/34667070/update-a-mysql-table-with-record-rankings-within-groups for how to add a counter ordered by time within each `id` group in the two tables. Then you can join on `id` and `counter`. – Barmar Nov 19 '19 at 22:39
  • Why is the fourth row shown for A `('3', 2016-02-07 19:12')` matched with the fourth row shown for B and not the fifth row ? There is probably example data that will show that just having start time before end time would not eliminate all overlap, and we would still have potential for cross product. As @Barmar suggests, I'd use inline view to assign a counter, from 1 to n for each `id` value, ordering by datetime; then join on combination of `id` and `counter`. – spencer7593 Nov 19 '19 at 22:40
  • 1
    It would also be better if the tables had a separate key that identifies the event uniquely, and can then be used to join the tables. – Barmar Nov 19 '19 at 22:40
  • @SchrodingersStat If I join based just on ID, then each record with id 1 in A will be joined with each record with id 1 in B. Essentially we will have a record in the joint table with start time = March 9 2016 18:09 but end time = February 6 2016 18:32. In other words, it implies that the end time is before the start time – Dynastywarriorlord07 Nov 19 '19 at 22:41
  • @spencer7593 I think he wants to pair each start time with the closest following end time for the same ID. – Barmar Nov 19 '19 at 22:43
  • @Dynastywarriorlord07 Ahh gotcha, I did not look at your table close enough. – SchrodingersStat Nov 19 '19 at 22:44
  • 1
    If you're using MySQL 8, you can use a window function to assign rank within the groups. – Barmar Nov 19 '19 at 22:44
  • What is the datatype of columns `start_time` and `end_time`: are they `datetime`s, or some string datatype? – GMB Nov 19 '19 at 22:48
  • @GMB Let's say they are timestamps – Dynastywarriorlord07 Nov 19 '19 at 23:02
  • @Dynastywarriorlord07: is it possible that two occurences of the same `id` would overlap (ie, for the same id, the next start time occurs before the end time of the ent time of the current record)? – GMB Nov 19 '19 at 23:06

3 Answers3

1

I agree with Barmar and encourage you to revisit your data model. I would expect start time and end time to be in the same table.

And while the existing ID may be for something like user_id, if that ID is duplicated in this table then there should be some other unique identifier, maybe transaction_id, that uniquely identifies each record.

alexherm
  • 1,362
  • 2
  • 18
  • 31
1

Since the id's are the same and the end date is higher than the start date.

If those times are strings then use STR_TO_DATE

SELECT a.id, a.`start time`, MIN(b.`end time`) AS `end time`
FROM A a
LEFT JOIN B b 
  ON b.id = a.id
 AND STR_TO_DATE(b.`end time`, '%Y-%m-%d %H:%i') > STR_TO_DATE(a.`start time`, '%Y-%m-%d %H:%i')
GROUP BY a.id, a.`start time`
ORDER BY a.id, a.`start time`;

If those are timestamps

SELECT a.id, a.`start time`, MIN(b.`end time`) AS `end time`
FROM A a
LEFT JOIN B b
  ON b.id = a.id
 AND b.`end time` > a.`start time`
GROUP BY a.id, a.`start time`
ORDER BY a.id, a.`start time`;

A test on rextester here

If there are many timestamps per B.id?
Then it might be more performant if the range is limited to a day or less.

SELECT a.id, a.`start time`, MIN(b.`end time`) AS `end time`
FROM A a
LEFT JOIN B b
  ON b.id = a.id
 AND b.`end time` > a.`start time` 
 AND b.`end time` < TIMESTAMPADD(HOUR,24,a.`start time`)
GROUP BY a.id, a.`start time`
ORDER BY a.id, a.`start time`;
LukStorms
  • 28,916
  • 5
  • 31
  • 45
0

Assuming that there are no overlaps between start/end times of the same id, you could join the tables, with a join condition based on a correlated subquery that ensures that the record of tableb that has the closest end_time after the current start_time of tablea is picked:

select
    a.*,
    b.end_time
from
    tablea a
    inner join tableb b
        on  b.id = a.id
        and b.end_time = (
            select min(b1.end_time)
            from tableb b1 
            where b1.id = a.id and b1.end_time > a.start_time
        )

Demo on DB Fiddle:

id | start_time       | end_time        
-: | :--------------- | :---------------
 1 | 2016-02-06 17:03 | 2016-02-06 18:32
 1 | 2016-03-09 18:09 | 2016-03-09 21:11
 2 | 2017-02-07 23:34 | 2017-02-08 01:22
 3 | 2016-02-07 19:12 | 2016-02-07 21:32
 3 | 2016-02-07 23:52 | 2016-02-08 02:11
GMB
  • 216,147
  • 25
  • 84
  • 135