Finding a previous, non-contiguous date using SQL

Question

Suppose a table, tableX, like this:

|     date    | hours |
| 2014-07-02  |  10   |
| 2014-07-03  |  10   |
| 2014-07-07  |  20   |
| 2014-07-08  |  40   |

The dates are 'workdays' -- that is, no weekends or holidays.

I want to find the increase in hours between consecutive workdays, like this:

|     date    | hours |
|  2014-07-03 |  0    | 
|  2014-07-07 | 10    |
|  2014-07-08 | 20    |

The challenge is dealing with the gaps. If there were no gaps, something like

SELECT t1.date1 AS 'first day', t2.date1 AS 'second day', (t2.hours - t1.hours) 
 FROM tableX t1
 LEFT JOIN tableX t2 ON t2.date1 = DATE_add(t1.date1, INTERVAL 1 DAY)
 ORDER BY t2.date1;

would get it done, but that doesn't work in this case as there is a gap between 2014-07-03 and 2014-07-07.

The real dataset is several years worth, so I think so, but can't be 100% sure. — Jeff Leyser, Jul 09 '14 at 22:18

Gordon Linoff · Answer 1 · 2014-07-10T02:51:05.337

0

Just use a correlated subquery instead. You have two fields, so you can do this with two correlated subqueries, or a correlated subquery with a join back to the table. Here is the first version:

SELECT t1.date1 as `first day`,
       (select t2.date1 
        from tableX t2
        where t2.date1 > t.date1
        order by t2.date asc
        limit 1
       ) as `next day`,
       (select t2.hours
        from tableX t2
        where t2.date1 > t.date1
        order by t2.date asc
        limit 1
       ) - t.hours
FROM tableX t
ORDER BY t.date1;

edited Jul 10 '14 at 02:51

answered Jul 09 '14 at 21:18

Gordon Linoff

1,242,037
58
646
786

This basically works (DESC should be ASC), but is really slow on my actual dataset (~8M records), so doesn't quite get the job done. – Jeff Leyser Jul 09 '14 at 22:20
Try adding an index on `tableX(date1)`. – Gordon Linoff Jul 10 '14 at 02:51

score 0 · Answer 2 · edited May 23 '17 at 12:11

Another alternative is to rank the data by date and then subtract the hours of the previous workday's date from the hours of the current workday's date.

 SELECT 
   ranked_t1.date1 date,
   ranked_t1.hours - ranked_t2.hours hours
 FROM
 (
   SELECT t.*, 
       @rownum := @rownum + 1 AS rank
  FROM (SELECT * FROM tableX ORDER BY date1) t, 
       (SELECT @rownum := 0) r
 ) ranked_t1
 INNER JOIN  
 (
   SELECT t.*, 
       @rownum2 := @rownum2 + 1 AS rank
  FROM (SELECT * FROM tableX ORDER BY date1) t, 
       (SELECT @rownum2 := 0) r
 ) ranked_t2
 ON ranked_t2.rank = ranked_t1.rank - 1;

SQL Fiddle demo

Note:

Obviously an index on tableX.date1 would speed up the query.

Instead of a correlated subquery, a join is used in the above query.

Reference:

Mysql rank function on SO

score 0 · Answer 3 · answered Jul 10 '14 at 22:13

Unfortunately, MySQL doesn't (yet) have analytic functions which would allow you to access the "previous row" or the "next row" of the data stream. However, you can duplicate it with this:

select  h2.LogDate, h2.Hours - h1.Hours as Added_Hours
from    Hours h1
left join Hours h2
    on  h2.LogDate =(
            select Min( LogDate )
            from   Hours
            where  LogDate > h1.LogDate )
where h2.LogDate is not null;

Check it out here. Note the index on the date field. If that field is not indexed, this query will take forever.

Finding a previous, non-contiguous date using SQL

3 Answers3