I am suck in this one. Wish I could do it in pure sql, but at this point any solution will do.
I have ta
and tb
tables, containing lists of events that occurred approximately at the same time. The goal is to find "orphan" records from ta
on tb
. E.g.:
create table ta ( dt date, id varchar(1));
insert into ta values( to_date('20130101 13:01:01', 'yyyymmdd hh24:mi:ss') , '1' );
insert into ta values( to_date('20130101 13:01:02', 'yyyymmdd hh24:mi:ss') , '2' );
insert into ta values( to_date('20130101 13:01:03', 'yyyymmdd hh24:mi:ss') , '3' );
create table tb ( dt date, id varchar(1));
insert into tb values( to_date('20130101 13:01:5', 'yyyymmdd hh24:mi:ss') , 'a' );
insert into tb values( to_date('20130101 13:01:6', 'yyyymmdd hh24:mi:ss') , 'b' );
But let's say I must use a threshold of +-5 seconds. So, the query to find would look something like:
select
ta.id ida,
tb.id idb
from
ta, tb
where
tb.dt between (ta.dt - 5/86400) and (ta.dt + 5/86400)
order by 1,2
(fiddle: http://sqlfiddle.com/#!4/b58f7c/5)
The rules are:
- Events are mapped 1 to 1
- The closest event on
tb
for a given one inta
will be considered the correct mapping.
That said, the resulting query should return something like
IDA | IDB
1 | a
2 | b
3 | null <-- orphan event
Though the sample query I've put here shows exactly the issue I am having. When the time overlaps, it is difficult to systematically choose the correct row.
dense_rank()
seems to be the answer to select the correct rows, but what partitioning/sorting will place them right?
Worth mentioning, I am doing this on a Oracle 11gR2.