4

I have a table with overlapping time periods. I would like to group the overlapping time events which are continuous (i.e. not separated with time gaps).

ID  StartDate            EndDate
1   2013-01-30  2013-01-31
2   2013-01-31  2013-01-31
3   2013-01-29  2013-01-31
4   2013-01-25  2013-01-28
5   2013-01-29  2013-01-30
6   2013-02-01  2013-02-01
7   2013-01-31  2013-02-02
8   2013-02-04  2013-02-05
9   2013-02-05  2013-02-06
10  2013-02-08  2013-02-09

       01-24   01-25   01-26   01-27   01-28   01-29   01-30    01-31   02-01   02-02   02-03   02-04   02-05   02-06   02-07   02-08   02-09  
1                                                          --------------    
2                                                                   -----                                                                                                  
3                                                   ---------------------
4                  -----------------------------
5                                                   ------------ 
6                                                                           -----                            
7                                                                    --------------------                             
8                                                                                                    -------------                                              
9                                                                                                             -------------                                                       
10                                                                                                                                   --------------                                                                    

As a result I would like to have following four time groups:

group 1 (IDs: 1, 2, 3, 5, 6, 7)

group 2 (Id: 4)

group 3 (IDs: 8, 9)

group 4: (Id: 10)

Is there an easy way in Sql of doing it? Here is a create sql for my example table:

DROP TABLE IF EXISTS tb_data_log;
CREATE TABLE tb_data_log (
  `event_id` int(10) unsigned NOT NULL,
  `startdate` date DEFAULT NULL,
  `enddate` date DEFAULT NULL
);

INSERT INTO tb_data_log VALUES (1, '2013-01-30', '2013-01-31');
INSERT INTO tb_data_log VALUES (2, '2013-01-31', '2013-01-31');
INSERT INTO tb_data_log VALUES (3, '2013-01-29', '2013-01-31');
INSERT INTO tb_data_log VALUES (4, '2013-01-25', '2013-01-28');
INSERT INTO tb_data_log VALUES (5, '2013-01-29', '2013-01-30');
INSERT INTO tb_data_log VALUES (6, '2013-02-01', '2013-02-01');
INSERT INTO tb_data_log VALUES (7, '2013-01-31', '2013-02-02');
INSERT INTO tb_data_log VALUES (8, '2013-02-04', '2013-02-05');
INSERT INTO tb_data_log VALUES (9, '2013-02-05', '2013-02-06');
INSERT INTO tb_data_log VALUES (10, '2013-02-08', '2013-02-09');

EDIT #1:

It looks like the problem is a bit hard to understand, here is the desired output:

GroupID StartDate   EndDate     Overlapped Id
      1 2013-01-29  2013-02-02  1, 2, 3, 5, 6, 7
      2 2013-01-25  2013-01-28  4      
      3 2013-02-04  2013-02-06  8,9
      4 2013-02-08  2013-02-09  10          
Strawberry
  • 33,750
  • 13
  • 40
  • 57
marebine
  • 120
  • 1
  • 2
  • 7
  • 1
    It is an interesting problem, but I am afraid I cannot see a simple way to do this without access to some kind of recursive function (which MySQL doesn't support). I am afraid it would probably be necessary to code this with a normal language and probably multiple queries. – Kickstart Sep 16 '14 at 12:00

2 Answers2

1

Here is a solution. It should work and uses no stored procedure:

select per_start,per_end,group_concat(contained.event_id) from tb_data_log contained,(
select distinct start.startdate as per_start,
finish.enddate as per_end
 from tb_data_log start join tb_data_log finish
on start.startdate <= finish.enddate -- first find all possible periods
where not exists (-- make sure there are two events in the period which do not overlap and between whom there is no event.
select * from tb_data_log a, tb_data_log b where 
a.enddate < b.startdate and
a.startdate>=start.startdate and
b.enddate<=finish.enddate and not exists 
(
select * from tb_data_log inside where
inside.event_id <> a.event_id
and inside.event_id<> b.event_id
and inside.enddate > a.enddate and inside.startdate < b.startdate
)
)
and not exists (-- make sure there is no longer period
select * from tb_data_log later where later.startdate<=finish.enddate and later.enddate >finish.enddate
)
and not exists (-- make sure there is no longer period
select * from tb_data_log earlier where earlier.startdate<start.startdate and earlier.enddate >=start.startdate
)
) periods where contained.enddate<=per_end and contained.startdate>=per_start
group by per_start,per_end

The idea is to first find all possible periods by joining the table with itself. Then for each period P make sure that there is no pair of periods A,B so that A is before B (no overlapping), both are contained in P, and there is no event between them. Also make sure that is not the longest possible period.

Here is the previous solution I posted, it is worse. Keeping it for reference

It is probably not very efficient. I used the selected answer from here: How to get list of dates between two dates in mysql select query So note that this query will stop working after 300 years!!!

select per_start,per_end,group_concat(contained.event_id) from tb_data_log contained,(
select distinct start.startdate as per_start,
finish.enddate as per_end
 from tb_data_log start, tb_data_log finish
where start.startdate <= finish.enddate -- first find all possible periods
and not exists (-- make sure there are no two consecutive days that are not contained in some      event period.
select * from 
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) day1, adddate('1970-01-     01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0+1) day2 from
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union   select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union  select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union   select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union  select 6 union select 7 union select 8 union select 9) t4) v
where day1 between start.startdate and finish.enddate and day2 between start.startdate and  finish.enddate 
and not exists (
select * from tb_data_log where tb_data_log.startdate <= cast(day1 as date) and    tb_data_log.enddate >= cast(day2 as date)
)
)
and not exists (-- make sure there is no longer period
select * from tb_data_log later where later.startdate<=finish.enddate and later.enddate    >finish.enddate
)
and not exists (-- make sure there is no longer period
select * from tb_data_log earlier where earlier.startdate<start.startdate and earlier.enddate    >=start.startdate
)
) periods where contained.enddate<=per_end and contained.startdate>=per_start
group by per_start,per_end

The idea is to first find all possible periods by joining the table with itself. Then for each period make sure that there is no pair of consecutive days that are contained in that period but are not covered by some event period in the table. Also make sure that is not the longest possible period.

I think the performance of this query can be somewhat improved.

Community
  • 1
  • 1
alexander zak
  • 929
  • 6
  • 13
0

something close to the answer (just close)

select 
    tmp.group_id, group_concat(tmp.id)
from
    (select 
        a.event_id as 'group_id', b.event_id as 'id'
    from
        tb_data_log a
    LEFT join tb_data_log b ON (a.startdate BETWEEN b.startdate AND b.enddate)
        or (a.enddate BETWEEN b.startdate AND b.enddate)) as tmp
group by group_id
Melon
  • 874
  • 8
  • 17
  • it is nowhere close to desired result - items grouped by overlaps. – Bulat Sep 16 '14 at 09:59
  • I edited the answer to get a bit close to the real one. it may help user2724602 or another to get to the final answer – Melon Sep 16 '14 at 10:50
  • Thank you all for your answers, I have edited the question and added a sample table with the desired output for easier understanding – marebine Sep 16 '14 at 11:50