I have a table like this:
ID BEGIN END
If there are overlapping episodes for the same ID (like 2000-01-01
- 2001-12-31
and 2000-06-01
- 2002-06-31
) I would like the rows to be merged, using MIN(BEGIN)
, MAX(END)
.
The same should be done if episodes are in a direct succession (like 2000-01-01
- 2000-06-31
and 2000-07-01
- 2000-12-31
).
If there are "missing" days between episodes (like 2000-01-01
- 2000-06-15
and 2000-07-01
- 2000-12-31
), they should not be merged.
How can this be achieved?
Currently my code looks like this:
SELECT "ID", MIN("BEGIN"), MAX("END")
FROM ...
GROUP BY "ID"
but of course, this doesn't fulfill the last condition (not to merge if there are "missing" days).
Thank you in advance!
[edit]
I am working on a solution, where I join the table with itself. It's an improvement, but it doesn't do the job yet. I think the other suggestions are better (but more complicated). However, I'd like to share my unfinished work in progress:
SELECT "ID", LEAST(tab1."BEGIN", tab2."BEGIN"), GREATEST(tab1."END", tab2."END")
FROM <mytable> AS tab1
JOIN <mytable> AS tab2
ON tab1."ID" = tab2."ID"
AND (tab1."BEGIN", tab1."END" + INTERVAL '2 day') OVERLAPS (tab2."BEGIN", tab2."END")
ORDER BY "ID"
[edit 2]
Thank you for your help!
I tried to figure out how window-functions and WITH-queries work for some hours by now - until I realised that my database runs on PostGreSQL 8.3 (which doesn't support neither of them). Is there a way to go without window-functions and WITH-queries?
Thank you once more!
[edit 3]
Sample data:
ID BEGIN END
1;"2000-01-01";"2000-03-31"
1;"2000-04-01";"2000-05-31"
1;"2000-04-15";"2000-07-31"
1;"2000-09-01";"2000-10-31"
2;"2000-02-01";"2000-03-15"
2;"2000-01-15";"2000-03-31"
2;"2000-04-01";"2000-04-15"
3;"2000-06-01";"2000-06-15"
3;"2000-07-01";"2000-07-15"
Sample output:
ID BEGIN END
1;"2000-01-01";"2000-07-31"
1;"2000-09-01";"2000-10-31"
2;"2000-01-15";"2000-04-15"
3;"2000-06-01";"2000-06-15"
3;"2000-07-01";"2000-07-15"
[edit 4]
one possible solution:
WITH
t1 AS (
SELECT id, begin AS time
FROM "nace-8510-test".checkfkt
UNION ALL
SELECT id, end
FROM "nace-8510-test".checkfkt
),
t2 AS (
SELECT Row_Number() OVER(PARTITION BY id ORDER BY time) AS num, id, time
FROM t1 AS t1_1
),
t3 AS (
SELECT t2_1.num - Row_Number() OVER(PARTITION BY t2_1.id ORDER BY t2_1.time, t2_2.time) num1,
t2_1.id, t2_1.time AS begin, t2_2.time AS end
FROM t2 AS t2_1
INNER JOIN t2 AS t2_2
ON t2_1.id = t2_2.id
AND t2_1.num = t2_2.num - 1
WHERE
EXISTS (
SELECT *
FROM "nace-8510-test".checkfkt AS s
WHERE s.id = t2_1.id
AND (s.begin < t2_2.time AND s.end > t2_1.time)
)
OR t2_1.time = t2_2.time
OR t2_1.time + INTERVAL '1 day' = t2_2.time
)
SELECT id, MIN(begin) AS von, MAX(end) AS bis
FROM t3
GROUP BY id, num1
ORDER BY id
With many thanks to the author of this article: http://blog.developpez.com/sqlpro/p9821/langage-sql-norme/agregation-d-intervalles-en-sql-1/