I'm trying to figure out a way to delete records with overlapping times but I'm unable to figure out a simple and elegant way of keeping all but one of those records which overlap. This question is similar to this one but with a few differences. Our table looks something like:
╔════╤═══════════════════════════════════════╤══════════════════════════════════════╤════════╤═════════╗
║ id │ start_time │ end_time │ bar │ baz ║
╠════╪═══════════════════════════════════════╪══════════════════════════════════════╪════════╪═════════╣
║ 0 │ Mon, 18 Dec 2017 16:08:33 UTC +00:00 │ Mon, 18 Dec 2017 17:08:33 UTC +00:00 │ "ham" │ "eggs" ║
╟────┼───────────────────────────────────────┼──────────────────────────────────────┼────────┼─────────╢
║ 1 │ Mon, 18 Dec 2017 16:08:32 UTC +00:00 │ Mon, 18 Dec 2017 17:08:32 UTC +00:00 │ "ham" │ "eggs" ║
╟────┼───────────────────────────────────────┼──────────────────────────────────────┼────────┼─────────╢
║ 2 │ Mon, 18 Dec 2017 16:08:31 UTC +00:00 │ Mon, 18 Dec 2017 17:08:31 UTC +00:00 │ "spam" │ "bacon" ║
╟────┼───────────────────────────────────────┼──────────────────────────────────────┼────────┼─────────╢
║ 3 │ Mon, 18 Dec 2017 16:08:30 UTC +00:00 │ Mon, 18 Dec 2017 17:08:30 UTC +00:00 │ "ham" │ "eggs" ║
╚════╧═══════════════════════════════════════╧══════════════════════════════════════╧════════╧═════════╝
In the example above, all records have overlapping times where overlapping just means that the range of time defined by a record's start_time
and end_time
(inclusive) covers or extends over part of another record's. However, for this problem we are interested not only in those records which have overlapping times but also have matching bar
and baz
columns (rows 0, 1, and 3 above). After finding those records we'd like to delete all but the earliest, leaving the table above with just records 2 and 3 because record 2 does not have matching bar
and baz
columns and 3 does and has the earliest start and end times.
Here's what I have so far:
delete from foos where id in (
select
foo_one.id
from
foos foo_one
where
user_id = 42
and exists (
select
1
from
foos foo_two
where
tsrange(foo_two.start_time::timestamp, foo_two.end_time::timestamp, '[]') &&
tsrange(foo_one.start_time::timestamp, foo_one.end_time::timestamp, '[]')
and
foo_one.bar = foo_two.bar
and
foo_one.baz = foo_two.baz
and
user_id = 42
and
foo_one.id != foo_two.id
)
);
Thanks for reading!
Update: I've found a solution that works for me, basically I could apply the window function row_number()
over a partition of the table that are grouped by bar
and baz
fields and then add a WHERE
clause to the DELETE
statement that excludes the first entry (the one with the smallest id
).
delete from foos where id in (
select id from (
select
foo_one.id,
row_number() over(partition by
bar,
baz
order by id asc)
from
foos foo_one
where
user_id = 42
and exists (
select
*
from
foos foo_two
where
tsrange(foo_two.start_time::timestamp,
foo_two.end_time::timestamp,
'[]') &&
tsrange(foo_one.start_time::timestamp,
foo_one.end_time::timestamp,
'[]')
and
foo_one.id != foo_two.id
)
) foos where row_number <> 1
);