I have a scenario where I need to join from my starting table, account_tbl
to a stu_ssn_tbl
. The columns of my join are stu_no
and stu_seq_no
.
Here's the structure of stu_ssn_tbl
CREATE TABLE stu_ssn_tbl
(
curr_ssn_ind character(1),
no character(9),
pseudo_ssn_ind character(1),
src character(8),
stu_dob date,
stu_no integer ,
stu_seq_no integer,
sys_cr_dt date,
do_not_use character(1)
);
Here's my sample data:
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('N','100 ','R','234600 ',to_timestamp('29-DEC-50','DD-MON-RR HH.MI.SSXFF AM'),1,7,to_timestamp('14-AUG-12','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('Y','101','R','234600 ',to_timestamp('29-DEC-50','DD-MON-RR HH.MI.SSXFF AM'),1,7,to_timestamp('14-AUG-12','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('Y','102 ','R','713001 ',to_timestamp('11-JAN-00','DD-MON-RR HH.MI.SSXFF AM'),1,6,to_timestamp('14-APR-03','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('N','203 ','R','368700 ',to_timestamp('05-DEC-53','DD-MON-RR HH.MI.SSXFF AM'),2,3,to_timestamp('01-JUL-98','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('Y','104 ','R','368700 ',to_timestamp('05-DEC-53','DD-MON-RR HH.MI.SSXFF AM'),2,3,to_timestamp('01-JUL-98','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('N','105 ','P','341100 ',to_timestamp('20-JUL-43','DD-MON-RR HH.MI.SSXFF AM'),46,3,to_timestamp('11-FEB-13','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('Y','106','P','341100 ',to_timestamp('20-JUL-43','DD-MON-RR HH.MI.SSXFF AM'),46,3,to_timestamp('11-FEB-13','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('N','107 ','R','184300 ',to_timestamp('27-JAN-55','DD-MON-RR HH.MI.SSXFF AM'),3,2,to_timestamp('22-MAY-01','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('Y','108','R','184300 ',to_timestamp('27-JAN-55','DD-MON-RR HH.MI.SSXFF AM'),3,2,to_timestamp('22-MAY-01','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('N','109 ','R','368700 ',to_timestamp('27-NOV-61','DD-MON-RR HH.MI.SSXFF AM'),1,3,to_timestamp('01-JUL-98','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('Y','601 ','R','368700 ',to_timestamp('27-NOV-61','DD-MON-RR HH.MI.SSXFF AM'),1,3,to_timestamp('01-JUL-98','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('N','900 ','R','293900 ',to_timestamp('20-JUN-71','DD-MON-RR HH.MI.SSXFF AM'),1,5,to_timestamp('23-APR-02','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('Y','2449','R','293900 ',to_timestamp('20-JUN-71','DD-MON-RR HH.MI.SSXFF AM'),1,5,to_timestamp('09-APR-03','DD-MON-RR HH.MI.SSXFF AM'),null);
Insert into stu_ssn_tbl (CURR_SSN_IND,NO,PSEUDO_SSN_IND,SRC,STU_DOB,STU_NO,STU_SEQ_NO,SYS_CR_DT,DO_NOT_USE) values ('N','95','P','293900 ',to_timestamp('20-JUN-71','DD-MON-RR HH.MI.SSXFF AM'),1,5,to_timestamp('23-APR-02','DD-MON-RR HH.MI.SSXFF AM'),null);
For the sake of simplicity, we'll say this is the structure of the account
table.
CREATE TABLE account_tbl
(
id integer,
stu_no integer,
stu_seq_no integer
);
So I'm going from account to stu_ssn_tbl on stu_no and stu_seq_no
SELECT *
FROM ACCOUNT_TBL ACCT
LEFT OUTER JOIN STU_SSN_TBL SSN
ON ACCT.STU_NO = SSN.STU_NO
AND ACCT.STU_SEQ_NO = SSN.STU_SEQ_NO;
The problem is that there is no enforced uniqueness on any of these tables (these tables were data dumps from an old database that nobody knows anything about). I need a way to select the correct record from stu_ssn_tbl based on the following rules.
- SYS_CR_DT date closest to the ROLLUPDATE( a date that will be provided, without going over it). For the sake of the test case we can say '2012-12-12'
- If two or more SYS_CR_DT dates exist that are the same, choose the record with a CURR_SSN_IND = 'Y'.
- If a record cannot be selected based on CURR_SSN_IND = 'Y' ( in the case that they are both 'Y' or 'N') select the record where PSEUDO_SSN_IND = 'R'
- If a match still cannot be selected, choose record at random.
I do not have a requirement as far as using PL/SQL or regular SQL to accomplish this, just need to keep in mind that I'm dealing with over 300 million records in the account_tbl and close to 100 million records in the stu_ssn_tbl.
I have two trains of thoughts on this. The preferable way to this would be during the join between account_tbl and stu_ssn_tbl. However, if this is not possible, I've added the do_not_use column, which is can use to functionally delete records that I do not want in pre processing.
So far I have this - which helps me settle bullet 1, and I decided to order an order by desc on CURR_SSN_IND and PSEUDO_SSN_IND because it would give me the record I would watch at the top. I tried putting a fetch first 1 row on it, but that brought back 1 record for the entire table.
SELECT
SRC.*
FROM STU_SSN_TBL SRC
LEFT OUTER JOIN STU_SSN_TBL CHK
ON SRC.STU_NO = CHK.STU_NO AND SRC.STU_SEQ_NO = CHK.STU_SEQ_NO
AND SRC.SYS_CR_DT < CHK.SYS_CR_DT
AND SRC.SYS_CR_DT < TO_DATE('2012-12-12', 'yyyy-mm-dd')
WHERE CHK.SYS_CR_DT IS NULL
ORDER BY SRC.STU_NO ASC, SRC.STU_SEQ_NO ASC, SRC.CURR_SSN_IND DESC , SRC.PSEUDO_SSN_IND DESC