1

enter image description hereI am trying to join two tables w.r.t the nearest date. Please see the screenshot for reference.

I have Medication Table and Side Effects Table and I need to know what is the medical location for a particular side affect.

Ex 1: Medication started on 2015-07-23 (2nd row in Medication Table) and there was a Side affect on 2015-07-24 (2nd row in Side Effect Table). So this tells me that the medical location is ABDOMEN.

Ex 2: Medication started on 2018-06-19 (last row in Medication Table) and there was a Side affect on 2018-07-25 (last row in Side Effect Table). So this tells me that the medical location is THIGH.

So, here I am just trying to map the nearest dates of both the tables and get that particular Medical Location in the resulting table.

When I write the join in the below way: MID = SID AND MNAME = SNAME AND Medication Start Date <= Side Effect Date then I am getting too many records. Looks like I am missing some condition to write here. Can you please help me on this.

Thanks.

Medication Table:
MID     MNAME       MedicalLocation   Medication Start Date
I1F-BE  132~1201    THIGH             2015-07-07
I1F-BE  132~1202    ABDOMEN           2015-07-23
I1F-BE  132~1203    ABDOMEN           2015-08-04
I1F-BE  132~1204    ABDOMEN           2015-08-18
I1F-BE  132~1205    ABDOMEN           2015-09-01
I1F-BE  132~1206    ABDOMEN           2015-09-15
I1F-BE  132~1207    ABDOMEN           2015-09-29
I1F-BE  132~1208    ABDOMEN           2015-10-13
I1F-BE  132~1209    THIGH             2015-10-27
I1F-BE  132~1210    ABDOMEN           2015-11-10
I1F-BE  132~1201    THIGH             2018-06-19
I1F-BE  132~1209    THIGH             2015-10-27
I1F-BE  132~1210    ABDOMEN           2015-11-10
I1F-BE  132~1201    THIGH             2018-06-19

Side Effect Table:

SID     SNAME        Side Effect             Side Effect Start Date
I1F-BE  132~1201    UTI                      2015-06-23
I1F-BE  132~1202    Injection Site Reaction  2015-07-24
I1F-BE  132~1203    Injection Site Reaction  2015-08-05
I1F-BE  132~1204    Viral Syndrome           2015-08-10
I1F-BE  132~1205    Injection Site Reaction  2015-08-18
I1F-BE  132~1206    Injection Site Reaction  2015-09-02
I1F-BE  132~1207    Injection Site Reaction  2015-09-16
I1F-BE  132~1208    Injection Site Reaction  2015-09-30
I1F-BE  132~1209    Injection Site Reaction  2015-10-14
I1F-BE  132~1210    Injection Site Reaction  2015-10-28
I1F-BE  132~1201    Basal Cell carcinoma     2018-07-25
san
  • 163
  • 1
  • 15

2 Answers2

2

The join approach work fine for small to middle tables, but has a poor performance on larger tables. The problem is that you must join all medication with all side effects only to discard all but the nearest combination.

Alternative approach that should scale uses a unified view to the both tables with an analytic function lookup to the LAGing record to get the location.

The unified view is produced with this query. Note that a new column SOURCE_TYPE is added to distinct the row source.

with tab as (
select SID, SNAME, START_DATE, 'SIDE_EFFECT' source_type,cast (null as varchar2(25)) MEDICALLOCATION,  SideEffect from SideEffect
union all
select MID, MNAME, START_DATE, 'MEDICATION' source_type, MEDICALLOCATION, null as SideEffect from Medication
)
select SID, SNAME, START_DATE, SOURCE_TYPE, MEDICALLOCATION, SIDEEFFECT 
from tab
where SID = 'I1F-BE' and SNAME = '132~1202'
order by 1,2,3;

SID        SNAME      START_DATE          SOURCE_TYPE MEDICALLOCATION           SIDEEFFECT              
---------- ---------- ------------------- ----------- ------------------------- -------------------------
I1F-BE     132~1202   23.07.2015 00:00:00 MEDICATION  ABDOMEN                                             
I1F-BE     132~1202   24.07.2015 00:00:00 SIDE_EFFECT                           Injection Site Reaction   

The only additional logic required is for the SIDE_EFFECT record, if preceeded directly by a MEDICATION record, to get the value of MEDICALLOCATION column from the lagging record.

The analytical function LAG uses PARTITION BY on your join key columns and the ORDER BY is defined by the start_date.

After moving the location to the site_effect row, you may discard all the medication records.

Final Query

with tab as (
select SID, SNAME, START_DATE, 'SIDE_EFFECT' source_type,cast (null as varchar2(25)) MEDICALLOCATION,  SideEffect from SideEffect
union all
select MID, MNAME, START_DATE, 'MEDICATION' source_type, MEDICALLOCATION, null as SideEffect from Medication
),
tab2 as (
select SID, SNAME, START_DATE, SOURCE_TYPE, MEDICALLOCATION,  SIDEEFFECT,
CASE when SOURCE_TYPE = 'SIDE_EFFECT' and
          lag(SOURCE_TYPE) over (partition by  SID, SNAME order by START_DATE) = 'MEDICATION' then 
     lag(MEDICALLOCATION) over (partition by  SID, SNAME order by START_DATE)
END as SIDEEFFECT_MEDICALLOCATION  
from tab)
select SID, SNAME, START_DATE, SIDEEFFECT, SIDEEFFECT_MEDICALLOCATION 
from tab2
where source_type = 'SIDE_EFFECT' 
order by 1,2,3;

SID        SNAME      START_DATE          SIDEEFFECT                SIDEEFFECT_MEDICALLOCATION
---------- ---------- ------------------- ------------------------- --------------------------
I1F-BE     132~1201   23.06.2015 00:00:00 UTI                                                  
I1F-BE     132~1201   25.07.2018 00:00:00 Basal Cell carcinoma      THIGH                      
I1F-BE     132~1202   24.07.2015 00:00:00 Injection Site Reaction   ABDOMEN                    
I1F-BE     132~1203   05.08.2015 00:00:00 Injection Site Reaction   ABDOMEN                    
I1F-BE     132~1204   10.08.2015 00:00:00 Viral Syndrome  
...
Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53
1

It sounds like you want to get the row from the Medication table which has the max medication date <= side effect date. This is a pretty common SQL pattern: see this question for an example with some great responses.

I've got a few suggestions for you. This first one is the naive way to write this - it will work on all Oracle versions. One possible issue to consider: what do you want to happen when "the nearest date" matches TWO Medications with identical start dates? This query will return one row for each match, which might be more than you expect.

select s.*, m.MedicalLocation
from side_effects s
left join medication m
  on m.mid = s.sid
  and m.mname = s.sname
  and m.medication_start_date = (
    select max(medication_start_date)
    from medication m2
    where m2.mid = m.mid
      and m2.mname = m.mname
      and m2.medication_start_date <= s.side_effect_date)
;

You can also write this somewhat more simply on Oracle 12c and up, with the fetch first 1 row only clause. This will guarantee that you only get 1 Medication record for each Side Effect - if you have two matches, Oracle will just choose one depending on how the rows are arranged on disk.

select s.*, 
    (select m.MedicalLocation
     from medication m
     where m.mid = s.sid
       and m.mname = s.sname
       and m.medication_start_date <= s.side_effect_date
     order by m.medication_start_date desc
     fetch first 1 row only) as MedicalLocation
from side_effects s
;

You can also check out the other answers to the question I linked for other ways to do this, e.g.

select s.*, max(m.MedicalLocation) KEEP (DENSE_RANK FIRST ORDER BY m.medication_start_date desc) as MedicalLocation
from side_effects s
left join medication m
  on m.mid = s.sid
  and m.mname = s.sname
  and m.medication_start_date <= s.side_effect_date
;
kfinity
  • 8,581
  • 1
  • 13
  • 20
  • Query in the middle is fine. The first query produces `ORA-01799: a column may not be outer-joined to a subquery` on Oracle 11. In 12c works but returns **duplicated row** for the duplicated medication on `2018-06-19`. Last query lacks the `GROUP BY`. – Marmite Bomber Jan 24 '19 at 06:48