Oracle SQL: Get the max record for each duplicate ID in self join table

Question

It's been marked as a duplicate and seems to be explained a bit in the linked questions, but I'm still trying to get the separate DEBIT and CREDIT columns on the same row.

I've created a View and I am currently self joining it. I'm trying to get the max Header_ID for each date.

My SQL is currently:

SELECT DISTINCT
TAB1.id, 
TAB1.glperiods_id, 
MAX(TAB2.HEADER_ID),
TAB1.batch_date, 
TAB1.debit, 
TAB2.credit, 
TAB1.descrip

FROM
IQMS.V_TEST_GLBATCH_GJ TAB1

LEFT OUTER JOIN
IQMS.V_TEST_GLBATCH_GJ TAB2
ON
TAB1.ID = TAB2.ID AND TAB1.BATCH_DATE = TAB2.BATCH_DATE AND TAB1.GLPERIODS_ID = TAB2.GLPERIODS_ID AND TAB1.DESCRIP = TAB2.DESCRIP AND TAB1.DEBIT <> TAB2.CREDIT

WHERE
TAB1.ACCT = '3648-00-0'
AND
TAB1.DESCRIP NOT LIKE '%INV%'
AND TAB1.DEBIT IS NOT NULL

GROUP BY
TAB1.id, 
TAB1.glperiods_id, 
TAB1.batch_date, 
TAB1.debit, 
TAB2.credit, 
TAB1.descrip

ORDER BY TAB1.batch_date

And the output for this is (37 rows in total):

I'm joining the table onto itself to get DEBIT and CREDIT on the same line. How do I select only the rows with the max HEADER_ID per BATCH_DATE ?

Update

For @sagi

Those highlighted with the red box are the rows I want and the ones in blue would be the ones I'm filtering out.

Fixed mistake

I recently noticed I had joined my table onto itself without making sure TAB2 ACCT='3648-00-0'.

The corrected SQL is here:

SELECT DISTINCT
TAB1.id, 
TAB1.glperiods_id, 
Tab1.HEADER_ID,
TAB1.batch_date, 
TAB1.debit, 
TAB2.credit, 
TAB1.descrip

FROM
IQMS.V_TEST_GLBATCH_GJ TAB1

LEFT OUTER JOIN
IQMS.V_TEST_GLBATCH_GJ TAB2
ON
TAB1.ID = TAB2.ID AND TAB1.BATCH_DATE = TAB2.BATCH_DATE AND TAB2.ACCT ='3648-00-0'AND TAB1.GLPERIODS_ID = TAB2.GLPERIODS_ID AND TAB1.DESCRIP = TAB2.DESCRIP AND TAB1.DEBIT <> TAB2.CREDIT

WHERE
TAB1.ACCT = '3648-00-0'
AND
TAB1.DESCRIP NOT LIKE '%INV%'
AND TAB1.DEBIT IS NOT NULL

ORDER BY TAB1.BATCH_DATE

score 1 · Accepted Answer · answered Aug 29 '17 at 14:00

1

Use window function like ROW_NUMBER() :

SELECT s.* FROM (
    SELECT t.*,
           ROW_NUMBER() OVER(PARTITION BY t.batch_id ORDER BY t.header_id DESC) as rnk
    FROM YourTable t
    WHERE t.ACCT = '3648-00-0'
      AND t.DESCRIP NOT LIKE '%INV%'
      AND t.DEBIT IS NOT NULL) s
WHERE s.rnk = 1

This is an analytic function that rank your record by the values provided in the OVER clause.

PARTITION - is the group
ORDER BY - Who's the first of this group (first gets 1, second 2, ETC)

It is a lot more efficient then joins(Your problem could have been solved in many ways) , and uses the table only once.

answered Aug 29 '17 at 14:00

sagi

40,026
6
59
84

Thanks for you response! I'm going try it now. Havn't used PARTITION BY before – spyr0 Aug 29 '17 at 14:05
Thanks! I think it works! Confused for a moment because I didn't see the batch_id. Changed it to batch_date – spyr0 Aug 29 '17 at 14:09
No wait sorry. It seems to get rid of the selected credit as well. – spyr0 Aug 29 '17 at 14:12
1

No problem, though I think the `batch_id` should be replaced with `id` – sagi Aug 29 '17 at 14:12
@spyr0 , can you tell me the `ID`s of the rows you expect in your output ? That will help me understand what's not correct. – sagi Aug 29 '17 at 14:14
I'll go look for them now – spyr0 Aug 29 '17 at 14:14
id + header_id of course , to be specific. – sagi Aug 29 '17 at 14:15
Ok so the rows I want within V_TEST_GLBATCH_GJ have the same ID number. Their HEADER_ID numbers vary too. And just as I am typing this I realized I self-joined the second table without adding the condition of TAB2.ACC ='3648-00-0'. But I am interested in getting it working this way as it seems much more efficient. I'll update the question with a picture of the results I need from whats returned – spyr0 Aug 29 '17 at 14:24
1

@spyr0 I'm not sure I understood the edit. try explaining in words , like : "I want the biggest `HEADER_ID` for each `ID` , or "I want the biggest `HEADER_ID` for each combination of `ID,DATE` " Or "I want all the `ID`s associated with the .... ETC . I just don't understand why would 129016 need to be returned and 123238 – sagi Aug 29 '17 at 14:34
Sorry about that. I think it looks a little more confusing now but I'll try and clarify. In the picture first below the UPDATE the image shows the result of SELECT* FROM V_TEST_GLBATCH_GJ WHERE ACCT='3648-00-0'. So with the initial results I was getting after self-joining the tables I noticed the rows I wanted had the largest HEADER_ID for each date compared to others. But I also have noticed I made a mistake where I was joining TAB1 with ACCT='3648-00-0' and TAB2 without ACCT='3648-00-0. I'm going to highlight that in the question now. – spyr0 Aug 29 '17 at 14:42
I think I can resolve my question with the corrected SQL but I now think it probably would be more efficient to use PARTITION OVER. – spyr0 Aug 29 '17 at 14:47
But if you want the biggest one, how come you want both `129016` records? they have different header, one is biggest than the other. @spyr0 – sagi Aug 29 '17 at 14:52
It was with how the results were returned in the beginning. But I've noticed what I had to fix with the initial SQL query and I think I may continue to work with that. Thanks for your help anyways! – spyr0 Aug 29 '17 at 15:06

Oracle SQL: Get the max record for each duplicate ID in self join table

1 Answers1