1

Here is my problem, I have a MYSQL table with the following columns and data examples :

id | user | starting date | ending date | activity code
1 | Andy | 2010-04-01 | 2010-05-01 | 3
2 | Andy | 1988-11-01 | 1991-03-01 | 3
3 | Andy | 2005-06-01 | 2008-08-01 | 3
4 | Andy | 2005-08-01 | 2008-11-01 | 3
5 | Andy | 2005-06-01 | 2010-05-01 | 4
6 | Ben  | 2010-03-01 | 2011-06-01 | 3
7 | Ben  | 2010-03-01 | 2010-05-01 | 4
8 | Ben  | 2005-04-01 | 2011-05-01 | 3

As you can see in this table users can have same activity code and similar dates or periods. And For a same user, periods can overlap others or not. It is also possible to have several overlap periods in the table.

What I want is a MYSQL QUERY to get the following result :

new id | user | starting date | ending date | activity code
1 | Andy | 2010-04-01 | 2010-05-01 | 3 => ok, no overlap period
2 | Andy | 1988-11-01 | 1991-03-01 | 3 => ok, no overlap period
3 | Andy | 2005-06-01 | 2008-11-01 | 3 => same user, same activity but ending date coming from row 4 as extended period 
4 | Andy | 2005-06-01 | 2010-05-01 | 4 => ok other activity code
5 | Ben  | 2005-04-01 | 2011-06-01 | 3 => ok other user, but as overlap period rows 6 and 8 for the same user and activity, I take the widest range
6 | Ben  | 2010-03-01 | 2010-05-01 | 4 => ok other activity for second user

In other words, for a same user and activity code, if there is no overlap, I need the starting and ending dates as they are. If there is an overlap for a same user and activity code, I need the lower starting date and the higher ending date coming from the different related rows. I need this for all the users and activity code of the table and in SQL for MYSQL.

I hope it is clear enough and someone can help me because I try different codes from solutions supplied on this site and others without success.

PM 77-1
  • 12,933
  • 21
  • 68
  • 111

2 Answers2

0

I have somewhat convoluted (strictly MySQL-specific) solution:

SET @user = NULL;
SET @activity = NULL;
SET @interval_id = 0;

SELECT
  MIN(inn.`starting date`) AS start,
  MAX(inn.`ending date`) AS end,
  inn.user,
  inn.`activity code`
  FROM
    (SELECT
       IF(user <> @user OR `activity code` <> @activity,  
          @interval_id := @interval_id  + 1, NULL),
       IF(user <> @user OR `activity code` <> @activity,  
          @interval_end := STR_TO_DATE('',''), NULL),
       @user := user,
       @activity := `activity code`,
       @interval_id := IF(`starting date` > @interval_end,
                          @interval_id + 1,
                          @interval_id) AS interval_id,
       @interval_end := IF(`starting date` < @interval_end,
                           GREATEST(@interval_end, `ending date`),
                           `ending date`) AS interval_end,
       t.*
     FROM Table1 t
     ORDER BY t.user, t.`activity code`, t.`starting date`, t.`ending date`) inn
GROUP BY inn.user, inn.`activity code`, inn.interval_id;

The underlying idea was shamelessly borrowed from the 1st answer to this question.

You can use this SQL Fiddle to review the results and try different source data.

Community
  • 1
  • 1
PM 77-1
  • 12,933
  • 21
  • 68
  • 111
  • Thanks a lot PM it works great. Honestly, it was really too complicated for me... surely I have a lot in SQL to learn again. I have only adapted your code to consider case where there is no ending date ('0000-00-00' as ending date in the table) occuring when it is still in progress. Thanks again – user3115576 Dec 19 '13 at 10:23
0

Here is a solution - (see http://sqlfiddle.com/#!2/fda3d/15)

SELECT DISTINCT summarized.`user`
  , summarized.activity_code
  , summarized.true_begin
  , summarized.true_end
FROM (
  SELECT t1.id,t1.`user`,t1.activity_code
    , MIN(LEAST(t1.`starting`, COALESCE(overlap.`starting` ,t1.`starting`))) as true_begin
    , MAX(GREATEST(t1.`ending`, COALESCE(overlap.`ending` ,t1.`ending`))) as true_end
  FROM t1
  LEFT JOIN t1 AS overlap
    ON t1.`user` = overlap.`user`
      AND t1.activity_code = overlap.activity_code
      AND overlap.`ending` >= t1.`starting`
      AND overlap.`starting` <= t1.`ending`
      AND overlap.id <> t1.id
  GROUP BY t1.id, t1.`user`, t1.activity_code) AS summarized;

I am not sure how it will perform with a large data set with many overlaps. You will definitely need an index on the user and activity_code fields - probably the starting and ending date fields also as part of that index.

AgRizzo
  • 5,261
  • 1
  • 13
  • 28
  • I have tried it but it does not give the expected result as the PM77-1 does, but thank you so much for your time. – user3115576 Dec 19 '13 at 10:26
  • @user3115576 - really not the correct results? In the link I gave you, didn't I return the result set you needed or is there additional data that you had that this doesn't work with? – AgRizzo Dec 19 '13 at 11:48
  • My sincere apologies AgRizzo... it works also. I gave a simple example with explicit columns names and data. When I tried to apply your code I made some adaptation errors. I have fixed them now and it works as well as the PM code... I will monitor the performance between both solutions when my database will grow up. Many thanks and my apologies again – user3115576 Dec 19 '13 at 14:40