0

I am using Oracle SQL Developer to query a database of cases, and I need to use three different identifiers to select the correct row for each case; however, my problem is that two of the identifiers are contained in the same text string, and I can’t figure out how to parse them to use in the query.

In the following table, the identifiers I need to use are:

  1. ID-1, characters 3–6 in the "Case" column (e.g., "1001"),
  2. ID-2, characters 8–9 in the "Case" column (e.g., "01") when they appear (treated as "00" if they don't appear), and
  3. ID-3, the value in the "Sequence" column (e.g., "672").

Source Table

Case Sequence Value 1 Value 2
AA1001 672 73 195
AA1001 711 73 185
AA1001-01 680 73 185
AA1001-02 685 72 185
AA1001-02 699 72 182
AB1002 676 51 36
AB1002-01 701 48 39
AB1002-01 719 48 35
AB1002-02 707 51 38
AA1003 655 122 416
AA1003 683 113 416

I want to return one row for each unique ID-1, such that first the greatest value for ID-2 is selected, and then the greatest value for ID-3 in that subset is chosen; so, the query should return only the following three rows from the table above.

Result Table

Case Sequence Value 1 Value 2
AA1001-02 699 72 182
AB1002-02 707 51 38
AA1003 683 113 416

I've tried taking the maximum for ID-2 using the following, but it only returns the rows where ID-2 equals "02".

SELECT * 
FROM table 
WHERE SUBSTR(Case,3,4) in ('1001','1002','1003') 
and SUBSTR(Case,8,2) = (SELECT MAX(SUBSTR(Case,8,2)) 
FROM table 
WHERE SUBSTR(Case,3,4) in ('1001','1002','1003'))

(The easiest answer is probably just to add a column for ID-2; however, the source database is strictly read-only, so I can't make that sort of change.)

MT0
  • 143,790
  • 11
  • 59
  • 117
perkins
  • 103
  • 2
  • 1
    Create your ID columns using sbstr in a CTE then select from your CTE. – Stu Oct 14 '21 at 15:08
  • Mostly a duplicate of https://stackoverflow.com/questions/121387/fetch-the-row-which-has-the-max-value-for-a-column you just need to handle the substrings. – MT0 Oct 14 '21 at 15:37
  • Obviously, in your attempt you aren't even considering the third ID - you are selecting row(s) with highest `ID-2` for each `ID-1`, but you are not further selecting just the row with the highest `SEQUENCE`. –  Oct 14 '21 at 17:02
  • 1
    Something else is odd though - you are only looking at the numeric part like 1001 and 1002. Don't the first two characters matter? Can't you have cases like AA1005-xx and AB1005-xx? (Same ID-1 of 1005, but the first two letters are **different**?) If this can happen, is your handling correct - you must MIX THEM TOGETHER? –  Oct 14 '21 at 17:16
  • @mathguy, I figured it was a waste of time to try to get the maximum Sequence when I couldn't yet find the highest ID-2. – perkins Oct 18 '21 at 13:36
  • Regarding your second comment, @mathguy, the first two characters are irrelevant for my purposes; AA1005 and AB1005 cannot both appear in the table. – perkins Oct 18 '21 at 13:40

2 Answers2

2

In Oracle 12.1 and higher, you can use the match_recognize clause:

select case, sequence, value1, value2
from   (
         select t.*, substr(case, 3, 4) as id_1, substr(case, 8, 2) as id_2
         from   table_name t
       )
match_recognize(
  partition by id_1
  order     by id_2 desc nulls last, sequence desc nulls last
  all rows per match
  pattern   (^ x)
  define    x as null is null
);


CASE        SEQUENCE     VALUE1     VALUE2
--------- ---------- ---------- ----------
AA1001-02        699         72        182
AB1002-02        707         51         38
AA1003           683        113        416
1

You can use the ROW_NUMBER() analytic function and find the SUBSTRings to partition/order by:

SELECT *
FROM   (
  SELECT t.*,
         ROW_NUMBER() OVER (
           PARTITION BY
             SUBSTR(Case, 3, 4)
           ORDER BY
             COALESCE(SUBSTR(Case, 8, 2), '00') DESC,
             Sequence DESC
         ) AS rn
  FROM   table_name t
)
WHERE rn = 1;

Which, for your sample data:

CREATE TABLE table_name (Case, Sequence, Value1, Value2 ) AS
SELECT 'AA1001',    672,  73, 195 FROM DUAL UNION ALL
SELECT 'AA1001',    711,  73, 185 FROM DUAL UNION ALL
SELECT 'AA1001-01', 680,  73, 185 FROM DUAL UNION ALL
SELECT 'AA1001-02', 685,  72, 185 FROM DUAL UNION ALL
SELECT 'AA1001-02', 699,  72, 182 FROM DUAL UNION ALL
SELECT 'AB1002',    676,  51,  36 FROM DUAL UNION ALL
SELECT 'AB1002-01', 701,  48,  39 FROM DUAL UNION ALL
SELECT 'AB1002-01', 719,  48,  35 FROM DUAL UNION ALL
SELECT 'AB1002-02', 707,  51,  38 FROM DUAL UNION ALL
SELECT 'AA1003',    655, 122, 416 FROM DUAL UNION ALL
SELECT 'AA1003',    683, 113, 416 FROM DUAL;

Outputs:

CASE SEQUENCE VALUE1 VALUE2 RN
AA1001-02 699 72 182 1
AB1002-02 707 51 38 1
AA1003 683 113 416 1

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
  • Or, instead of coalescing `'00'` when `ID-2` is missing, simply order by `ID-2 desc NULLS LAST`. Allow `ID-2` to be `NULL` even for ordering, just handle as needed. –  Oct 14 '21 at 16:47
  • Thank you for this; it works perfectly, but the other answer returns results about twice as fast for me. – perkins Oct 18 '21 at 13:42