How can I select rows in an SQL table by using multiple substrings of a single column value?

Question

I am using Oracle SQL Developer to query a database of cases, and I need to use three different identifiers to select the correct row for each case; however, my problem is that two of the identifiers are contained in the same text string, and I can’t figure out how to parse them to use in the query.

In the following table, the identifiers I need to use are:

ID-1, characters 3–6 in the "Case" column (e.g., "1001"),
ID-2, characters 8–9 in the "Case" column (e.g., "01") when they appear (treated as "00" if they don't appear), and
ID-3, the value in the "Sequence" column (e.g., "672").

Source Table

Case	Sequence	Value 1	Value 2
AA1001	672	73	195
AA1001	711	73	185
AA1001-01	680	73	185
AA1001-02	685	72	185
AA1001-02	699	72	182
AB1002	676	51	36
AB1002-01	701	48	39
AB1002-01	719	48	35
AB1002-02	707	51	38
AA1003	655	122	416
AA1003	683	113	416

I want to return one row for each unique ID-1, such that first the greatest value for ID-2 is selected, and then the greatest value for ID-3 in that subset is chosen; so, the query should return only the following three rows from the table above.

Result Table

Case	Sequence	Value 1	Value 2
AA1001-02	699	72	182
AB1002-02	707	51	38
AA1003	683	113	416

I've tried taking the maximum for ID-2 using the following, but it only returns the rows where ID-2 equals "02".

SELECT * 
FROM table 
WHERE SUBSTR(Case,3,4) in ('1001','1002','1003') 
and SUBSTR(Case,8,2) = (SELECT MAX(SUBSTR(Case,8,2)) 
FROM table 
WHERE SUBSTR(Case,3,4) in ('1001','1002','1003'))

(The easiest answer is probably just to add a column for ID-2; however, the source database is strictly read-only, so I can't make that sort of change.)

Create your ID columns using sbstr in a CTE then select from your CTE. — Stu, Oct 14 '21 at 15:08
Mostly a duplicate of https://stackoverflow.com/questions/121387/fetch-the-row-which-has-the-max-value-for-a-column you just need to handle the substrings. — MT0, Oct 14 '21 at 15:37
Obviously, in your attempt you aren't even considering the third ID - you are selecting row(s) with highest `ID-2` for each `ID-1`, but you are not further selecting just the row with the highest `SEQUENCE`. — , Oct 14 '21 at 17:02
Something else is odd though - you are only looking at the numeric part like 1001 and 1002. Don't the first two characters matter? Can't you have cases like AA1005-xx and AB1005-xx? (Same ID-1 of 1005, but the first two letters are **different**?) If this can happen, is your handling correct - you must MIX THEM TOGETHER? — , Oct 14 '21 at 17:16
@mathguy, I figured it was a waste of time to try to get the maximum Sequence when I couldn't yet find the highest ID-2. — perkins, Oct 18 '21 at 13:36
Regarding your second comment, @mathguy, the first two characters are irrelevant for my purposes; AA1005 and AB1005 cannot both appear in the table. — perkins, Oct 18 '21 at 13:40

score 2 · Accepted Answer · 2021-10-18T14:01:16.147

In Oracle 12.1 and higher, you can use the match_recognize clause:

select case, sequence, value1, value2
from   (
         select t.*, substr(case, 3, 4) as id_1, substr(case, 8, 2) as id_2
         from   table_name t
       )
match_recognize(
  partition by id_1
  order     by id_2 desc nulls last, sequence desc nulls last
  all rows per match
  pattern   (^ x)
  define    x as null is null
);


CASE        SEQUENCE     VALUE1     VALUE2
--------- ---------- ---------- ----------
AA1001-02        699         72        182
AB1002-02        707         51         38
AA1003           683        113        416

Thank you; `match_recognize` is new to me, so this is extremely helpful. — perkins, Oct 18 '21 at 13:47

score 1 · Answer 2 · edited Oct 14 '21 at 17:09

You can use the ROW_NUMBER() analytic function and find the SUBSTRings to partition/order by:

SELECT *
FROM   (
  SELECT t.*,
         ROW_NUMBER() OVER (
           PARTITION BY
             SUBSTR(Case, 3, 4)
           ORDER BY
             COALESCE(SUBSTR(Case, 8, 2), '00') DESC,
             Sequence DESC
         ) AS rn
  FROM   table_name t
)
WHERE rn = 1;

Which, for your sample data:

CREATE TABLE table_name (Case, Sequence, Value1, Value2 ) AS
SELECT 'AA1001',    672,  73, 195 FROM DUAL UNION ALL
SELECT 'AA1001',    711,  73, 185 FROM DUAL UNION ALL
SELECT 'AA1001-01', 680,  73, 185 FROM DUAL UNION ALL
SELECT 'AA1001-02', 685,  72, 185 FROM DUAL UNION ALL
SELECT 'AA1001-02', 699,  72, 182 FROM DUAL UNION ALL
SELECT 'AB1002',    676,  51,  36 FROM DUAL UNION ALL
SELECT 'AB1002-01', 701,  48,  39 FROM DUAL UNION ALL
SELECT 'AB1002-01', 719,  48,  35 FROM DUAL UNION ALL
SELECT 'AB1002-02', 707,  51,  38 FROM DUAL UNION ALL
SELECT 'AA1003',    655, 122, 416 FROM DUAL UNION ALL
SELECT 'AA1003',    683, 113, 416 FROM DUAL;

Outputs:

CASE SEQUENCE VALUE1 VALUE2 RN

AA1001-02 699 72 182 1

AB1002-02 707 51 38 1

AA1003 683 113 416 1

db<>fiddle here

Or, instead of coalescing `'00'` when `ID-2` is missing, simply order by `ID-2 desc NULLS LAST`. Allow `ID-2` to be `NULL` even for ordering, just handle as needed. — , Oct 14 '21 at 16:47
Thank you for this; it works perfectly, but the other answer returns results about twice as fast for me. — perkins, Oct 18 '21 at 13:42

How can I select rows in an SQL table by using multiple substrings of a single column value?

Source Table

Result Table

2 Answers2