I'm trying to merge overlapping start-end timestamps into single timespan. Similar problem is available here on SO. I would like to merge timestamps separately for every user I have in data.
Sample data:
-- drop table if exists app_log;
create table app_log (
user_id int,
login_time timestamp,
logout_time timestamp
);
insert into app_log values
(1, '2014-01-01 08:00', '2014-01-01 10:00'), /* here we start */
(1, '2014-01-01 09:10', '2014-01-01 09:59'), /* fully included in previous interval */
(1, '2014-01-01 10:00', '2014-01-01 10:48'), /* continuing first interval */
(1, '2014-01-01 10:40', '2014-01-01 10:49'), /* continuing previous interval */
(1, '2014-01-01 10:55', '2014-01-01 11:00'), /* isolated interval */
(2, '2014-01-01 09:00', '2014-01-01 11:00'), /* 2nd user is shifted by one hour */
(2, '2014-01-01 10:10', '2014-01-01 10:59'), /* to simulate overlaps with 1st user */
(2, '2014-01-01 11:00', '2014-01-01 11:48'),
(2, '2014-01-01 11:40', '2014-01-01 11:49'),
(2, '2014-01-01 11:55', '2014-01-01 12:00')
;
Required results:
used_id login_time logout_time
1 2014-01-01 08:00 2014-01-01 10:49 /* Merging first 4 lines */
1 2014-01-01 10:55 2014-01-01 11:00 /* 5 th line is isolated */
2 2014-01-01 09:00 2014-01-01 11:49 /* Merging lines 6-9 */
2 2014-01-01 11:55 2014-01-01 12:00 /* last line is isolated */
I've tried to use on of the solutions provided in the mentioned question, but it does not return correct answer even for single user:
with recursive
in_data as (select login_time as d1, logout_time as d2 from app_log where user_id = 1)
, dateRanges (ancestorD1, parentD1, d2, iter) as
(
--anchor is first level of collapse
select
d1 as ancestorD1,
d1 as parentD1,
d2,
cast(0 as int) as iter
from in_data
--recurse as long as there is another range to fold in
union all
select
tLeft.ancestorD1,
tRight.d1 as parentD1,
tRight.d2,
iter + 1 as iter
from dateRanges as tLeft join in_data as tRight
--join condition is that the t1 row can be consumed by the recursive row
on tLeft.d2 between tRight.d1 and tRight.d2
--exclude identical rows
and not (tLeft.parentD1 = tRight.d1 and tLeft.d2 = tRight.d2)
)
select
ranges1.*
from dateRanges as ranges1
where not exists (
select 1
from dateRanges as ranges2
where ranges1.ancestorD1 between ranges2.ancestorD1 and ranges2.d2
and ranges1.d2 between ranges2.ancestorD1 and ranges2.d2
and ranges2.iter > ranges1.iter
);
Results in:
ancestord1 parentd1 d2 iter
2014-01-01 10:55:00;2014-01-01 10:55:00;2014-01-01 11:00:00;0
2014-01-01 08:00:00;2014-01-01 10:40:00;2014-01-01 10:49:00;2
2014-01-01 09:10:00;2014-01-01 10:40:00;2014-01-01 10:49:00;3
What is wrong with the query above and how can I extend it to get results by users? Is there better solution for this problem in PostgreSQL?