I am trying to aggregate a dataset I'll call cust_info
. It looks like this:
ID Sec_ID Group_id Cust_ID Gender EFF_DATE END_DATE
--------------------------------------------------------------------
11 H12 222 12 F 1/1/2014 12/31/2014
11 H11 222 31 F 1/1/2015 12/31/2015
11 H11 222 12 F 1/1/2016 4/30/2016
11 H11 222 44 F 5/1/2016 4/30/2017
11 H11 333 11 F 5/1/2017 12/31/9999
22 H23 222 22 M 12/1/2015 11/30/2016
22 H21 222 11 M 1/1/2017 6/30/2017
22 H21 222 33 M 7/1/2017 11/30/2017
I want to get the minimum EFF_DATE
and the maximum END_DATE
for each ID
, sec_id
. I also want the group_id
and cust_id
from the record with the maximum END_DATE
.
So I end up with:
11 H11 333 11 F 1/1/2014 12/31/9999
22 H21 222 33 M 12/1/2015 11/30/2017
Currently my code pulls min(eff_date)
and Max(end_date)
with a group by ID, Sec_id, Grp_id, Gender
. But if there are more than two records for a group this doesn't work. Also, this is an inner query that joins to another file.
Here's the code I'm using now:
select a.id, b.sec_id, b.group_id, b.cust_id, b.gender,
min(b.min_eff_date) as min_eff_date,
max(b.max_end_date) as max_end_date
from first_dataset a
left join (
select b.id, b.sec_id, b.group_id, b.gender, b.cust_id,
min(b.eff_date) as min_eff_date,
max(b.end_date) as max_end_date
from cust_info b
group by b.id, b.sec_id, b.group_id, b.cust_id, b.gender
) b on a.id=b.id and
a.sec_id = b.sec_id
And then I run another query on the results of the above with a min(min_eff_date)
and a max(max_end_date)
. But I still get duplicates.
I want to see if I can do this in one query. I've tried a bunch of combinations of ROW_NUMBER
. I've also tried using the KEEP(DENSE_RANK LAST ORDER BY MAX_END_DATE)
.
Can I do this in one query?
The data and code I've provided are all test examples, the real data involves ~ 3 million rows.