Oracle Query - Join with comma separated data

Question

Table Name : crm_mrdetails

 id | mr_name | me_email     | mr_mobile  | mr_doctor|
 -----------------------------------------------------
 1  | John    |abc@gmail.com | 1234555555 | ,1,2,3   |

Table Name : crm_mr_doctor

id | dr_name     | specialization|  
----------------------------------  
1  | Abhishek    | cordiologist  |
2  | Krishnan    | Physician     |
3  | Krishnan    | Nurse         |

The concatenated values in mrdetails.mr_doctor are the foreign keys for mr_doctor.id. I need to join on them to produce output like this:

 id | mr_name | me_email     |Doctor_Specialization|
 -------------------------------------------------
 1  | John    |abc@gmail.com |cordiologist,Physician,Nurse|

I'm new to Oracle, I'm using Oracle 12C. Any help much appreciated.

How can I join both tables, the one I have provided sample, join has to be done on same column. — Koti Raavi, Jul 24 '17 at 11:08
You can't cross join here, its not my model, someone has created we need to use as it is. Join should be happened on Mr_Doctor and ID — Koti Raavi, Jul 24 '17 at 11:32
The front-end you use to interact with the database is SQL Developer, and its most current version is 4.3 (I think). 12c is the version of your Oracle Database, a different software product. SQL Developer is irrelevant to your question; the database version is. I edited your tags and I will edit the question to reflect this. — , Jul 24 '17 at 11:46
Just to be clear: The input string in column `mr_doctor` begins with a comma (before the number 1), but the output should not begin with a comma? — , Jul 24 '17 at 11:48
Yes correct, comma is exist (Leading and Trailing), I don't know why data model created like this. The one I have provided example — Koti Raavi, Jul 24 '17 at 11:51
APC: Okay, so the concatenated values in mrdetails.mr_doctor are the foreign keys for mr_doctor.id? Yes, I know Data model is not as expected, sorry I have to use no option — Koti Raavi, Jul 24 '17 at 11:52

APC · Answer 1 · 2019-10-28T13:07:49.180

First of all we must acknowledge that is a bad data model. The column mr_doctor violates First Normal Form. This is not some abstruse theoretical point. Not being in 1NF means we must write more code to lookup the meaning of the keys instead of using standard SQL join syntax. It also means we cannot depend on the column containing valid IDs: mr_doctor can contain any old nonsense and we must write a query which will can handle that. See Is storing a delimited list in a database column really that bad? for more on this.

Anyway. Here is a solution which uses regular expressions to split the mr_doctor column into IDs and then joins them to the mr_doctor table. The specialization column is concatenated to produce the required output.

select mrdet.id, 
       mrdet.mr_name,
       mrdet.me_email,
       listagg(mrdoc.specialization, ',') 
                     within group (order by mrdoc.specialization) as doctor_specialization
from mr_details mrdet
     join (
        select distinct id, 
               regexp_substr(mr_doctor, '(,?)([0-9]+)(,?)', 1, level, null, 2) as dr_id
        from mr_details 
        connect by level <= regexp_count(mr_doctor, '(,?)([0-9]+)')
       ) as mrids
    on mrids.id = mrdet.id
    left outer join mr_doctor mrdoc
       on mrids.dr_id = mr_doc.id
group by mrdet.id, 
       mrdet.mr_name,
       mrdet.me_email
/

This solution is reasonably resilient despite the data model being brittle. It will return results if the string has too many commas, or spaces. It will ignore values which are letters or otherwise aren't numbers. It won't hurl if the extracted number doesn't match an ID in the mr_doctor table. Obviously the results are untrustworthy for those reasons, but that's part of the price of a shonky data model.

Can you please explain the following: (,?)([0-9]+)(,?)

The pattern matches zero or one comma followed by one or more digits followed by zero or one comma. Perhaps the (,?) in the matched patterns aren't strictly necessary. However, without them, this string 2 3 4 would match the same three IDs as this string 2,3,4. Maybe that's correct maybe it isn't. When the foreign keys are stored in a CSV column instead of being enforced through a proper constraint what does 'correct' even mean?

Hi @APC, Can you please explain the following: `(,?)([0-9]+)(,?)` you used in SUBSTR. I am not able to get my head around it. The first part `(,?)` means 0 or more occurrence of question mark in the beginning and the last means 0 or more occurrence of question mark in the end however if we consider 0 in both cases and try to match numbers then ,99, should be consider as 9 & 9 separately! Now i bring my attention to the + inside the range `[0-9+]` which means 1 or more occurrence. It seems kinda confusing to me even with the last argument as 2. I am new to regexp_substr ! — pOrinG, Jul 24 '17 at 16:16
@pOrinG - Good question. I have expanded my answer. The thing is, I have had bad experiences using regex for data cleansing so I tend to be verbose and over-specify the matches. Many regex practitioners favour brevity and elegance. There are arguments on both sides. — APC, Jul 24 '17 at 16:54

score 2 · Answer 2 · answered Jul 24 '17 at 12:05

You have to split data in mr_doctor column into rows, join table crm_mrdoctor and then use listagg(). How to split data? Splitting string into multiple rows in Oracle

select t.id, max(mr_name) mr_name, 
       listagg(specialization, ', ') within group (order by rn) specs
  from (
    select id, mr_name, levels.column_value rn, 
           trim(regexp_substr(mr_doctor, '[^,]+', 1, levels.column_value)) as did
      from crm_mrdetails t,
           table(cast(multiset(select level 
                                 from dual 
                                 connect by level <= 
                                     length(regexp_replace(t.mr_doctor, '[^,]+')) + 1) 
                      as sys.odcinumberlist)) levels) t
  left join crm_mr_doctor d on t.did = d.id
  group by t.id

Demo and result:

with crm_mrdetails (id, mr_name, mr_doctor) as (
    select 1, 'John', ',1,2,3'   from dual union all
    select 2, 'Anne', ',4,2,6,5' from dual union all
    select 3, 'Dave', ',4'       from dual),
crm_mr_doctor (id, dr_name, specialization) as (
    select 1, 'Abhishek', 'cordiologist' from dual union all
    select 2, 'Krishnan', 'Physician'    from dual union all
    select 3, 'Krishnan', 'Nurse'        from dual union all
    select 4, 'Krishnan', 'Onkologist'   from dual union all
    select 5, 'Krishnan', 'Surgeon'      from dual union all
    select 6, 'Krishnan', 'Nurse'        from dual
    )
select t.id, max(mr_name) mr_name, 
       listagg(specialization, ', ') within group (order by rn) specs
  from (
    select id, mr_name, levels.column_value rn, 
           trim(regexp_substr(mr_doctor, '[^,]+', 1, levels.column_value)) as did
      from crm_mrdetails t,
           table(cast(multiset(select level 
                                 from dual 
                                 connect by level <= 
                                     length(regexp_replace(t.mr_doctor, '[^,]+')) + 1) 
                      as sys.odcinumberlist)) levels) t
  left join crm_mr_doctor d on t.did = d.id
  group by t.id

Output:

    ID MR_NAME SPECS
------ ------- -------------------------------------
     1 John    cordiologist, Physician, Nurse
     2 Anne    Onkologist, Physician, Nurse, Surgeon
     3 Dave    Onkologist

Why t is used two times? – Syed Uzair Uddin Nov 16 '18 at 13:28 — Syed Uzair Uddin, Nov 16 '18 at 13:28

score 1 · Answer 3 · answered Oct 28 '19 at 13:21

You can use a recursive sub-query and simple string functions (which may be faster than using regular expressions and a correlated hierarchical query):

Oracle Setup:

CREATE TABLE crm_mrdetails (id, mr_name, mr_doctor) as
    select 1, 'John', ',1,2,3'   from dual union all
    select 2, 'Anne', ',4,2,6,5' from dual union all
    select 3, 'Dave', ',4'       from dual;

CREATE TABLE crm_mr_doctor (id, dr_name, specialization) as
    select 1, 'Abhishek', 'cordiologist' from dual union all
    select 2, 'Krishnan', 'Physician'    from dual union all
    select 3, 'Krishnan', 'Nurse'        from dual union all
    select 4, 'Krishnan', 'Onkologist'   from dual union all
    select 5, 'Krishnan', 'Surgeon'      from dual union all
    select 6, 'Krishnan', 'Nurse'        from dual;

Query:

WITH crm_mrdetails_bounds ( id, mr_name, mr_doctor, start_pos, end_pos ) AS (
  SELECT id,
         mr_name,
         mr_doctor,
         2,
         INSTR( mr_doctor, ',', 2 )
  FROM   crm_mrdetails
UNION ALL
  SELECT id,
         mr_name,
         mr_doctor,
         end_pos + 1,
         INSTR( mr_doctor, ',', end_pos + 1 )
  FROM   crm_mrdetails_bounds
  WHERE  end_pos > 0
),
crm_mrdetails_specs ( id, mr_name, start_pos, specialization_id ) AS (
  SELECT id,
         mr_name,
         start_pos,
         TO_NUMBER(
           CASE end_pos
           WHEN 0
           THEN SUBSTR( mr_doctor, start_pos )
           ELSE SUBSTR( mr_doctor, start_pos, end_pos - start_pos )
           END
         )
  FROM   crm_mrdetails_bounds
)
SELECT s.id,
       MAX( s.mr_name ) AS mr_name,
       LISTAGG( d.specialization, ',' )
         WITHIN GROUP ( ORDER BY s.start_pos )
         AS doctor_specialization
FROM   crm_mrdetails_specs s
       INNER JOIN crm_mr_doctor d
       ON ( s.specialization_id = d.id )
GROUP BY s.id

Output:

ID | MR_NAME | DOCTOR_SPECIALIZATION             
-: | :------ | :---------------------------------
 1 | John    | cordiologist,Physician,Nurse      
 2 | Anne    | Onkologist,Physician,Nurse,Surgeon
 3 | Dave    | Onkologist

db<>fiddle here

pOrinG · Answer 4 · 2017-07-24T11:56:39.783

Please change the column names according to your requirement.

CREATE OR REPLACE Function ReplaceSpec
    (String_Inside IN Varchar2)
    Return Varchar2 Is

        outputString Varchar2(5000);
        tempOutputString crm_doc.specialization%TYPE;

    Begin

        FOR i in 1..(LENGTH(String_Inside)-LENGTH(REPLACE(String_Inside,',',''))+1)
        LOOP

            Select specialization into tempOutputString From crm_doc 
            Where id = PARSING_STRING(String_Inside,i);

            If i != 1 Then
                outputString := outputString || ',';
            end if;
            outputString := outputString || tempOutputString;

        END LOOP;

        Return outputString;


    End;
/

The Parsing_String function to help split the comma separated values.

CREATE OR REPLACE Function PARSING_STRING
(String_Inside IN Varchar2, Position_No IN Number) 
Return Varchar2 Is
    OurEnd   Number; Beginn Number;
Begin

If Position_No < 1 Then 
Return Null; 
End If;

OurEnd := Instr(String_Inside, ',', 1, Position_No);

If OurEnd = 0 Then
    OurEnd := Length(String_Inside) + 1;
End If;

If Position_No = 1 Then
    Beginn := 1;
Else
    Beginn := Instr(String_Inside, ',', 1, Position_No-1) + 1;
End If;

Return Substr(String_Inside, Beginn, OurEnd-Beginn);

End;
/

Please note that I have given only a basic function to get your output. You might need to add some exceptions etc.

Eg. When the doc_id [mr_doctor] is empty, what to do.

Usage

select t1.*,ReplaceSpec(doc_id) from crm_details t1

if your mr_doctor data always starts with a comma use:

Select t1.*,ReplaceSpec(Substr(doc_id,2)) from crm_details t1

lot of work to recreate the wheel [LISTAGG](https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions089.htm#SQLRF30030) — Jorge Campos, Jul 24 '17 at 11:52
@JorgeCampos I couldn't imagine a workflow where just LISTAGG and a single query could do the entire required work. I could be wrong !! — pOrinG, Jul 24 '17 at 11:57
I have read only access, I can't create any tables/procudures/functions, fetching data into SQL and reporting it — Koti Raavi, Jul 24 '17 at 12:01
@APC I am actually happy that someone did it. Will help me grow. Cheers! — pOrinG, Jul 24 '17 at 12:39

score 0 · Answer 5 · answered Oct 28 '19 at 13:23

Please go through https://oracle-base.com/articles/misc/string-aggregation-techniques String Aggregation Techniques

or

SELECT deptno, LTRIM(MAX(SYS_CONNECT_BY_PATH(ename,',')) KEEP (DENSE_RANK LAST ORDER BY curr),',') AS employees FROM (SELECT deptno, ename, ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY ename) AS curr, ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY ename) -1 AS prev FROM emp) GROUP BY deptno CONNECT BY prev = PRIOR curr AND deptno = PRIOR deptno START WITH curr = 1

or

listagg and wm_concat an also be used as other people have used it

Can you format the code (i.e. query) in your question as code? — Itamar Mushkin, Oct 28 '19 at 13:45

score -1 · Answer 6 · answered Nov 27 '17 at 13:56

How about this one? I have not tested it, so there could be any syntax error.

select id,mr_name,me_email,listagg(specialization,',') within group (order by specialization) as Doctor_Specialization
from
(select dtls.id,dtls.mr_name,dtls.me_email,dr.specialization
from crm_mrdetails dtls,
crm_mr_doctor dr
where INSTR(','||dtls.mr_doctor||',' , ','||dr.id||',') > 0
) group by id,mr_name,me_email;

Oracle Query - Join with comma separated data

6 Answers6

Linked

Related