4

I have a requirement to retrieve a list of employees, and for each employee a list of months they were actively on benefits coverage in a given year. There is a table with job data, and a table with benefits information. There is also a delivered dates table that lists out every date from 2007-2018 and for each date it shows the day of month, month of year, and calendar year.

The way I have written the query now is to say: find all the dates on the dates table that are 1) between 01/01 and 12/31 of the prompt year(or the current date, whichever is older), 2) during the time the employee was active on the benefits table. For each date I also want the deptid from the jobs table and the benefit plan from the benefit table as of that date. Then I do a distinct, only showing the month of year, and calendar year for each employee.

This works, but the problem comes when I try to do it for departments with lots of people in them. It takes a very long time to run, I believe because it is retrieving up to 365 rows for every single employee and then only showing 12 of those, since it is only pulling distinct months. I feel like there is a better way to do this, I just can't think of what it is.

Here are some simplified examples of the tables I'm working with:

Dates Table

THE_DATE   MONTHOFYEAR   CALENDAR_YEAR
01-OCT-15  10            2015
02-OCT-15  10            2015
03-OCT-15  10            2015
...

Jobs Table

(A=Active; I=Inactive)

EMPLID     EFFDT         DEPTID           HR_STATUS
00123      01-FEB-15     900              A
00123      30-JUN-15     900              I
00123      01-AUG-15     901              A

Benefits Table

EMPLID     EFFDT         BENEFIT_PLAN     STATUS
00123      01-MAR-15     PPO              A
00123      31-JUL-15                      I
00123      01-SEP-15     HMO              A

Desired Result

EMPLID     CALENDAR_YEAR MONTHOFYEAR      DEPTID         BENEFIT_PLAN
00123      2015          3                900            PPO
00123      2015          4                900            PPO
00123      2015          5                900            PPO
00123      2015          6                900            PPO
00123      2015          7                900            PPO
00123      2015          9                901            HMO
00123      2015          10               901            HMO
00123      2015          11               901            HMO
^ (shows November row even though employee was only covered for part of this month)

Example SQL to Get Results Above

SELECT DISTINCT J.EMPLID, D.CALENDAR_YEAR, D.MONTHOFYEAR, J.DEPTID, B.BENEFIT_PLAN
FROM DATES D, 
     JOBS J 
     JOIN 
     BENEFITS B 
     ON J.EMPLID = B.EMPLID
WHERE D.THE_DATE <= SYSDATE
AND D.THE_DATE BETWEEN 
        TO_DATE(:YEAR_PROMPT || '01-01', 'YYYY-MM-DD') 
        AND 
        TO_DATE(:YEAR_PROMPT || '12-31', 'YYYY-MM-DD')
AND B.STATUS = 'A'
AND D.THE_DATE BETWEEN 
        B.EFFDT 
        AND 
        NVL(SELECT MIN(B_ED.EFFDT) 
            FROM BENEFITS B_ED
            WHERE B_ED.EMPLID = B.EMPLID
            AND B_ED.EFFDT > B.EFFDT
        , SYSDATE)
AND J.EFFDT = (SELECT MAX(J_ED.EFFDT)
               FROM JOBS J_ED
               WHERE J_ED.EMPLID = J.EMPLID
               AND J_ED.EFFDT <= D.THE_DATE)

Instead of saying "retrieve every single date and check to see if it fits the criteria", can I change up the logic somehow to get the same results without churning through so many rows?

1 Answers1

1

Yes; by using the LEAD() analytic function, you can calculate the next effdt in the jobs and benefits tables, which makes it easier to query between the ranges.

Something like:

with dates as (select trunc(sysdate, 'yyyy') - 1 + level the_date,
                      to_number(to_char(trunc(sysdate, 'yyyy') - 1 + level, 'mm')) monthofyear,
                      to_number(to_char(sysdate, 'yyyy')) calendar_year
               from   dual
               connect by level <= 365),
      jobs as (select 123 emplid, to_date('01/02/2015', 'dd/mm/yyyy') effdt, 900 deptid, 'A' hr_status from dual union all
               select 123 emplid, to_date('30/06/2015', 'dd/mm/yyyy') effdt, 900 deptid, 'I' hr_status from dual union all
               select 123 emplid, to_date('01/08/2015', 'dd/mm/yyyy') effdt, 901 deptid, 'A' hr_status from dual),
  benefits as (select 123 emplid, to_date('01/03/2015', 'dd/mm/yyyy') effdt, 'PPO' benefit_plan, 'A' status from dual union all
               select 123 emplid, to_date('31/07/2015', 'dd/mm/yyyy') effdt, null benefit_plan, 'I' status from dual union all
               select 123 emplid, to_date('01/09/2015', 'dd/mm/yyyy') effdt, 'HMO' benefit_plan, 'A' status from dual),
-- ********* end of mimicking your tables ********* --
         j as (select emplid,
                      effdt,
                      deptid,
                      hr_status,
                      lead(effdt, 1, sysdate) over (partition by emplid order by effdt) next_effdt
               from   jobs),
         b as (select emplid,
                      effdt,
                      benefit_plan,
                      status,
                      lead(effdt, 1, sysdate) over (partition by emplid order by effdt) next_effdt
               from   benefits)
select distinct j.emplid,
                d.calendar_year,
                d.monthofyear,
                j.deptid,
                b.benefit_plan
from   j
       inner join dates d on (d.the_date >= j.effdt and d.the_date < j.next_effdt)
       inner join b on (j.emplid = b.emplid)
where  d.the_date <= sysdate
and    d.the_date between to_date (:year_prompt || '01-01', 'YYYY-MM-DD')
                      and to_date (:year_prompt || '12-31', 'YYYY-MM-DD') -- if no index on d.the_date, maybe use trunc(the_date, 'yyyy') = :year_prompt
and    b.status = 'A'
and    d.the_date between b.effdt and b.next_effdt
order by 1, 4, 2, 3;

    EMPLID CALENDAR_YEAR MONTHOFYEAR     DEPTID BENEFIT_PLAN
---------- ------------- ----------- ---------- ------------
       123          2015           3        900 PPO         
       123          2015           4        900 PPO         
       123          2015           5        900 PPO         
       123          2015           6        900 PPO         
       123          2015           7        900 PPO         
       123          2015           9        901 HMO         
       123          2015          10        901 HMO         
       123          2015          11        901 HMO   

(Obviously, you can exclude the dates, jobs and benefits subqueries from the above query, since you already have those tables. They're only present in the query to simulate having tables with that data in it without needing to actually create the tables.).


ETA: Here's a version that just calculates the 12 months based on the year that's passed in, which reduces the date rows to 12, rather than 365/366 rows.

Unfortunately, you'll still need the distinct, to take account of when you have multiple rows starting in the same month.

For example, with the data in the following example, you would end up with 3 rows for month 6 if you removed the distinct. However, the number of rows the distinct is operating over will be far less than previously.

with dates as (select add_months(to_date(:year_prompt || '-01-01', 'YYYY-MM-DD'), - 1 + level) the_date,
                      level monthofyear,
                      :year_prompt calendar_year -- assuming this is a number
               from   dual
               connect by level <= 12),
      jobs as (select 123 emplid, to_date('01/02/2015', 'dd/mm/yyyy') effdt, 900 deptid, 'A' hr_status from dual union all
               select 123 emplid, to_date('15/06/2015', 'dd/mm/yyyy') effdt, 900 deptid, 'I' hr_status from dual union all
               select 123 emplid, to_date('26/06/2015', 'dd/mm/yyyy') effdt, 900 deptid, 'A' hr_status from dual union all
               select 123 emplid, to_date('01/08/2015', 'dd/mm/yyyy') effdt, 901 deptid, 'A' hr_status from dual),
  benefits as (select 123 emplid, to_date('01/03/2015', 'dd/mm/yyyy') effdt, 'PPO' benefit_plan, 'A' status from dual union all
               select 123 emplid, to_date('31/07/2015', 'dd/mm/yyyy') effdt, null benefit_plan, 'I' status from dual union all
               select 123 emplid, to_date('01/09/2015', 'dd/mm/yyyy') effdt, 'HMO' benefit_plan, 'A' status from dual),
-- ********* end of mimicking your tables ********* --
         j as (select emplid,
                      trunc(effdt, 'mm') effdt,
                      deptid,
                      hr_status,
                      trunc(coalesce(lead(effdt) over (partition by emplid order by effdt) -1, sysdate), 'mm') end_effdt
                        -- subtracting 1 from the lead(effdt) since here since the original sql had d.the_date < j.next_effdt and we need
                        -- to take into account when the next_effdt is the first of the month; we want the previous month to be displayed
               from   jobs),
         b as (select emplid,
                      trunc(effdt, 'mm') effdt,
                      benefit_plan,
                      status,
                      trunc(lead(effdt, 1, sysdate) over (partition by emplid order by effdt), 'mm') end_effdt
               from   benefits)
select distinct j.emplid,
                d.calendar_year,
                d.monthofyear,
                j.deptid,
                b.benefit_plan
from   j
       inner join dates d on (d.the_date between j.effdt and j.end_effdt)
       inner join b on (j.emplid = b.emplid)
where  d.the_date <= sysdate
and    b.status = 'A'
and    d.the_date between b.effdt and b.end_effdt
order by 1, 4, 2, 3;

    EMPLID CALENDAR_YEAR MONTHOFYEAR     DEPTID BENEFIT_PLAN                    
---------- ------------- ----------- ---------- --------------------------------
       123 2015                    3        900 PPO                             
       123 2015                    4        900 PPO                             
       123 2015                    5        900 PPO                             
       123 2015                    6        900 PPO                             
       123 2015                    6        900 PPO                             
       123 2015                    7        900 PPO                             
       123 2015                    9        901 HMO                             
       123 2015                   10        901 HMO                             
       123 2015                   11        901 HMO    
Boneist
  • 22,910
  • 1
  • 25
  • 40
  • This SQL is interesting for the subquery to create the `dates` table. I'll have to keep that in case I can use it later. However, it is still retrieving every date and checking to see if it fits the criteria. For example, if you remove the `distinct` keyword from the query, 219 rows are returned. My goal is to prevent the SQL from churning through so many rows. Is there any way to get the same results without using the `distinct` keyword? –  Nov 05 '15 at 20:02
  • Duh; I meant to check the distinct part, but forgot! Will look again at this tomorrow. – Boneist Nov 05 '15 at 20:17
  • I appreciate your help, this one has been hard for me to figure out. I didn't say it specifically in the question, although I probably should have, that I was assuming this could be done without the `dates` table at all. I thought there might be some way to look at the ranges on the `benefits` table and come up with the list of months they were active. But it may not be possible without using the dates table and asking the "does this date match the criteria" question on every row. –  Nov 05 '15 at 20:23
  • It's entirely possible that you could avoid the dates table, although that's a little more complex. What you could do in the meantime is use the row_number() analytic function against the dates table per trunc(the_date, 'mm') and then join that against the jobs and benefits tables (where = 1) (although you'll have to trunc the start/end dates to the start of the month too. – Boneist Nov 05 '15 at 20:31
  • Ok, I've updated my answer with another solution. Unfortunately, I don't think you'll be able to remove the distinct (unless you can guarentee that the job/benefits tables will never have more than one row with an effdt in the same month). – Boneist Nov 06 '15 at 11:20
  • 1
    Thanks for taking the time to work on this. It is not exactly what I wanted but it sounds like what I want may not really be possible. So, I'll go ahead and mark yours as the answer. Thanks! –  Nov 09 '15 at 17:36