0

I am trying to aggregate a dataset I'll call cust_info. It looks like this:

ID    Sec_ID    Group_id  Cust_ID   Gender   EFF_DATE   END_DATE  
--------------------------------------------------------------------
11      H12       222       12        F       1/1/2014    12/31/2014  
11      H11       222       31        F       1/1/2015    12/31/2015  
11      H11       222       12        F       1/1/2016     4/30/2016  
11      H11       222       44        F       5/1/2016     4/30/2017  
11      H11       333       11        F       5/1/2017    12/31/9999  
22      H23       222       22        M      12/1/2015    11/30/2016  
22      H21       222       11        M       1/1/2017     6/30/2017  
22      H21       222       33        M       7/1/2017    11/30/2017

I want to get the minimum EFF_DATE and the maximum END_DATE for each ID, sec_id. I also want the group_id and cust_id from the record with the maximum END_DATE.

So I end up with:

11      H11       333       11        F        1/1/2014    12/31/9999  
22      H21       222       33        M       12/1/2015    11/30/2017 

Currently my code pulls min(eff_date) and Max(end_date) with a group by ID, Sec_id, Grp_id, Gender. But if there are more than two records for a group this doesn't work. Also, this is an inner query that joins to another file.

Here's the code I'm using now:

select a.id, b.sec_id, b.group_id, b.cust_id, b.gender,  
   min(b.min_eff_date) as min_eff_date, 
   max(b.max_end_date) as max_end_date  
from first_dataset a  
left join (  
  select b.id, b.sec_id, b.group_id, b.gender, b.cust_id, 
         min(b.eff_date) as min_eff_date, 
         max(b.end_date) as max_end_date
  from cust_info b  
  group by b.id, b.sec_id, b.group_id, b.cust_id, b.gender
) b  on a.id=b.id  and 
        a.sec_id = b.sec_id

And then I run another query on the results of the above with a min(min_eff_date) and a max(max_end_date). But I still get duplicates.

I want to see if I can do this in one query. I've tried a bunch of combinations of ROW_NUMBER. I've also tried using the KEEP(DENSE_RANK LAST ORDER BY MAX_END_DATE).

Can I do this in one query?
The data and code I've provided are all test examples, the real data involves ~ 3 million rows.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
D Crimkey
  • 13
  • 3

2 Answers2

0

I think the folowing request will do the job:

SELECT DISTINCT a.id,
       b.sec_id,
       FIRST_VALUE(b.group_id) OVER (PARTITION BY a.id, b.sec_id GROUP BY b.end_date DESC) group_id,
       FIRST_VALUE(b.cust_id) OVER (PARTITION BY a.id, b.sec_id GROUP BY b.end_date DESC) group_id,
       b.gender,  
       min(b.eff_date) OVER (PARTITION BY a.id, b.sec_id) as min_eff_date, 
       max(b.end_date) OVER (PARTITION BY a.id, b.sec_id) as max_end_date  
  FROM first_dataset a,
       cust_info b   
 WHERE a.id = b.id (+)
   AND a.sec_id = b.sec_id (+)
Guillaume
  • 81
  • 4
0

This does what your description says:

WITH cte AS (
  SELECT row_number() OVER (PARTITION BY id, sec_id ORDER BY end_date DESC) AS rn
       , ID, Sec_ID, Group_id, Cust_ID, Gender
       , min(eff_date) OVER (PARTITION BY id, sec_id) AS EFF_DATE  -- exception
       , END_DATE
   FROM   cust_info
   )
SELECT ID, Sec_ID, Group_id, Cust_ID, Gender, EFF_DATE, END_DATE  
FROM   cte
WHERE  rn = 1;

Key element is the analytic function ROW_NUMBER() in the CTE.

Neither your displayed result nor your query currently fit the description.

SQL Fiddle.

Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Yes this takes care of the inner query -- thank you. Would I be able to use this as my inner query joining to my outer query? So I would have an outer query selecting from two tables, the inner query is what you have described and then left joining the outer query to inner query on ID? Also, how were you able to reformat what I sent – D Crimkey Dec 12 '17 at 18:07
  • @DCrimkey: Click on [edit] to edit your question. You can always edit your own questions. And high reputation users can edit any post. And yes, you can join subqueries, tables or CTEs in various ways. (All table expressions.) – Erwin Brandstetter Dec 13 '17 at 01:47