0

I am working on a "counting days" problem almost identical to this one. I have a list of date(s), and need to count how many days used excluding duplicate, and handling the gaps. Same input and output.

From: Markus Jarderot

Input
ID   d1           d2
 1   2011-08-01   2011-08-08
 1   2011-08-02   2011-08-06
 1   2011-08-03   2011-08-10
 1   2011-08-12   2011-08-14
 2   2011-08-01   2011-08-03
 2   2011-08-02   2011-08-06
 2   2011-08-05   2011-08-09

Output
ID   hold_days
 1          11
 2           8

SQL to find time elapsed from multiple overlapping intervals

But for the life of me I couldn't understand Markus Jarderot's solution.

SELECT DISTINCT
    t1.ID,
    t1.d1 AS date,
    -DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) AS n
FROM Orders t1
LEFT JOIN Orders t2                   -- Join for any events occurring while this
    ON t2.ID = t1.ID                  -- is starting. If this is a start point,
    AND t2.d1 <> t1.d1                -- it won't match anything, which is what
    AND t1.d1 BETWEEN t2.d1 AND t2.d2 -- we want.
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0

Why is DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) picking from the min(d1) from the entire list? Is that regardless of ID.

And what does t1.d1 BETWEEN t2.d1 AND t2.d2 do? Is that to ensure only overlapped interval are calculated?

Same thing with group by, I think because if in the event the same identical period will be discarded? I tried to trace the solution by hand but getting more confused.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
George
  • 4,514
  • 17
  • 54
  • 81
  • Which [DBMS](https://en.wikipedia.org/wiki/DBMS) product are you using? "SQL" is just a query language, not the name of a specific database product (and `datediff` is a non-standard function) –  Jan 30 '18 at 22:09
  • @a_horse_with_no_name I am using SAS SQL pass through to oracle DB, but a general walk through of this solution would be helpful. – George Jan 30 '18 at 22:12
  • You can't be using Oracle if `datediff()` works –  Jan 30 '18 at 22:12
  • I haven't actually implement it / port it yet, as I am trying to understand what the solution is trying to do. – George Jan 30 '18 at 22:13
  • @George . . . Your code will work neither in Oracle nor SAS. It looks like SQL Server code. – Gordon Linoff Jan 30 '18 at 22:48

4 Answers4

2

This is mostly a duplicate of my answer here (including explanation) but with the inclusion of grouping on an id column. It should use a single table scan and does not require a recursive sub-query factoring clause (CTE) or self joins.

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE your_table ( id, usr, start_date, end_date ) AS
  SELECT 1, 'A', DATE '2017-06-01', DATE '2017-06-03' FROM DUAL UNION ALL
  SELECT 1, 'B', DATE '2017-06-02', DATE '2017-06-04' FROM DUAL UNION ALL -- Overlaps previous
  SELECT 1, 'C', DATE '2017-06-06', DATE '2017-06-06' FROM DUAL UNION ALL
  SELECT 1, 'D', DATE '2017-06-07', DATE '2017-06-07' FROM DUAL UNION ALL -- Adjacent to previous
  SELECT 1, 'E', DATE '2017-06-11', DATE '2017-06-20' FROM DUAL UNION ALL
  SELECT 1, 'F', DATE '2017-06-14', DATE '2017-06-15' FROM DUAL UNION ALL -- Within previous
  SELECT 1, 'G', DATE '2017-06-22', DATE '2017-06-25' FROM DUAL UNION ALL
  SELECT 1, 'H', DATE '2017-06-24', DATE '2017-06-28' FROM DUAL UNION ALL -- Overlaps previous and next
  SELECT 1, 'I', DATE '2017-06-27', DATE '2017-06-30' FROM DUAL UNION ALL
  SELECT 1, 'J', DATE '2017-06-27', DATE '2017-06-28' FROM DUAL UNION ALL -- Within H and I
  SELECT 2, 'K', DATE '2011-08-01', DATE '2011-08-08' FROM DUAL UNION ALL -- Your data below
  SELECT 2, 'L', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
  SELECT 2, 'M', DATE '2011-08-03', DATE '2011-08-10' FROM DUAL UNION ALL
  SELECT 2, 'N', DATE '2011-08-12', DATE '2011-08-14' FROM DUAL UNION ALL
  SELECT 3, 'O', DATE '2011-08-01', DATE '2011-08-03' FROM DUAL UNION ALL
  SELECT 3, 'P', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
  SELECT 3, 'Q', DATE '2011-08-05', DATE '2011-08-09' FROM DUAL;

Query 1:

SELECT id,
       SUM( days ) AS total_days
FROM   (
  SELECT id,
         dt - LAG( dt ) OVER ( PARTITION BY id
                               ORDER BY dt ) + 1 AS days,
         start_end
  FROM   (
    SELECT id,
           dt,
           CASE SUM( value ) OVER ( PARTITION BY id
                                    ORDER BY dt ASC, value DESC, ROWNUM ) * value
             WHEN 1 THEN 'start'
             WHEN 0 THEN 'end'
           END AS start_end
    FROM   your_table
    UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
  )
  WHERE start_end IS NOT NULL
)
WHERE start_end = 'end'
GROUP BY id

Results:

| ID | TOTAL_DAYS |
|----|------------|
|  1 |         25 |
|  2 |         13 |
|  3 |          9 |
MT0
  • 143,790
  • 11
  • 59
  • 117
0

The brute force method is to create all days (in a recursive query) and then count:

with dates(id, day, d2) as
(
  select id, d1 as day, d2 from mytable
  union all
  select id, day + 1, d2 from dates where day < d2
)
select id, count(distinct day)
from dates
group by id
order by id;

Unfortunately there is a bug in some Oracle versions and recursive queries with dates don't work there. So try this code and see whether it works in your system. (I have Oracle 11.2 and the bug still exists there; so I guess you need Oracle 12c.)

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
0

If all your intervals start at different dates, consider them in ascending order by d1 counting how many days are from d1 to the next interval. You can discard an interval of it is contained in another one. The last interval won't have a follower.

This query should give you how many days each interval give

select a.id, a.d1,nvl(min(b.d1), a.d2) - a.d1
from orders a
left join orders b
on a.id = b.id and a.d1 < b.d1 and a.d2 between b.d1 and b.d2
group by a.id, a.d1

Then group by id and sum days

1010
  • 1,779
  • 17
  • 27
0

I guess Markus' idea is to find all starting points that are not within other ranges and all ending points that aren't. Then just take the first starting point till the first ending point, then the next starting point till the next ending point, etc. As Markus isn't using a window function to number starting and ending points, he must find a more complicated way to achieve this. Here is the query with ROW_NUMBER. Maybe this gives you a start what to look for in Markus' query.

select startpoint.id, sum(endpoint.day - startpoint.day)
from
(
  select id, d1 as day, row_number() over (partition by id order by d1) as rn
  from mytable m1
  where not exists
  (
    select *
    from mytable m2
    where m1.id = m2.id 
    and m1.d1 > m2.d1 and m1.d1 <= m2.d2
  )
) startpoint
join
(
  select id, d2 as day, row_number() over (partition by id order by d1) as rn
  from mytable m1
  where not exists
  (
    select *
    from mytable m2
    where m1.id = m2.id 
    and m1.d2 >= m2.d1 and m1.d2 < m2.d2
  )
) endpoint on endpoint.id = startpoint.id and endpoint.rn = startpoint.rn
group by startpoint.id
order by startpoint.id;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73