Remove duplicate from strings in sql

Question

I have 2 cols ID Value

ab^bc^ab^de
mn^mn^op

I want the output as ID Value

ab^bc^de
mn^op

Can someone please help me in this.✋ I have around 500 rows in the table.

I tried using stuff and other ways but errors are popping up.

MT0 · Answer 1 · 2022-11-29T09:39:52.380

You can use a recursive query and simple string functions (which is much faster than regular expressions, but a little more to type) to split the string and then, in later Oracle versions, can re-aggregate it using LISTAGG(DISTINCT ...:

WITH bounds ( rid, value, spos, epos ) AS (
  SELECT ROWID, value, 1, INSTR(value, '^', 1)
  FROM   table_name
UNION ALL
  SELECT rid, value, epos + 1, INSTR(value, '^', epos + 1)
  FROM   bounds
  WHERE  epos > 0
)
SELECT LISTAGG(
         DISTINCT 
         CASE epos
         WHEN 0
         THEN SUBSTR(value, spos)
         ELSE SUBSTR(value, spos, epos - spos)
         END,
         '^'
       ) WITHIN GROUP (ORDER BY spos) AS unique_values
FROM   bounds
GROUP BY rid;

Which, for the sample data:

CREATE TABLE table_name (value) AS
SELECT 'ab^bc^ab^de' FROM DUAL UNION ALL
SELECT 'mn^mn^op' FROM DUAL UNION ALL
SELECT 'ab^bc^ab^de' FROM DUAL UNION ALL
SELECT 'one^two^three^one^two^one^four' FROM DUAL;

Outputs:

UNIQUE_VALUES
ab^bc^de
mn^op
ab^bc^de
one^two^three^four

If you are using earlier versions or Oracle that do not support DISTINCT in the LISTAGG then you can aggregate twice:

WITH bounds ( rid, value, spos, epos ) AS (
  SELECT ROWID, value, 1, INSTR(value, '^', 1)
  FROM   table_name
UNION ALL
  SELECT rid, value, epos + 1, INSTR(value, '^', epos + 1)
  FROM   bounds
  WHERE  epos > 0
),
words (rid, word, spos) AS (
  SELECT rid,
         CASE epos
         WHEN 0
         THEN SUBSTR(value, spos)
         ELSE SUBSTR(value, spos, epos - spos)
         END,
         spos
  FROM   bounds
),
unique_words ( rid, word, spos ) AS (
  SELECT rid,
         word,
         MIN(spos)
  FROM   words
  GROUP BY rid, word
)
SELECT LISTAGG(word, '^') WITHIN GROUP (ORDER BY spos) AS unique_values
FROM   unique_words
GROUP BY rid;

Which gives the same output.

fiddle

score 0 · Answer 2 · answered Nov 29 '22 at 08:59

For example:

Sample data:

SQL> with
  2  test (col) as
  3    (select 'ab^bc^ab^de' from dual union all
  4     select 'mn^mn^op'    from dual
  5    ),

Split values into rows:

  6  temp as
  7    (select
  8        col,
  9        regexp_substr(col, '[^\^]+', 1, column_value) val,
 10        column_value lvl
 11     from test cross join
 12       table(cast(multiset(select level from dual
 13                           connect by level <= regexp_count(col, '\^') + 1
 14                          ) as sys.odcinumberlist))
 15    )

Aggregate them back, using only distinct values:

 16  select col,
 17         listagg(val, '^') within group (order by lvl) as result
 18  from (select col, val, min(lvl) lvl
 19        from temp
 20        group by col, val
 21       )
 22  group by col;

COL         RESULT
----------- --------------------
ab^bc^ab^de ab^bc^de
mn^mn^op    mn^op

SQL>

If you have CROSS APPPLY instead of CROSS JOIN, you don't even need the "table(cast(multiset(", you have to alias "level" : "select level as column_value from dual connect by ...". — p3consulting, Nov 29 '22 at 18:58

score 0 · Answer 3 · answered Nov 29 '22 at 09:16

Other solutions if your ORACLE version is recent enough to have LISTAGG DISTINCT:

with data(s) as (
    select 'ab^bc^ab^de' from dual union all
    select 'mn^mn^op' from dual
),
splitted(s, l, r) as (
    select s, level, regexp_substr(s,'[^\^]+',1,level) from data
    connect by regexp_substr(s,'[^\^]+',1,level) is not null and s = prior s and prior sys_guid() is not null
)
select s, listagg(distinct r, '^') within group(order by l) as r  from splitted
group by s
;

And better if you have a PK, use it:

with data(id, s) as (
    select 1, 'ab^bc^ab^de' from dual union all
    select 2, 'mn^mn^op' from dual
),
splitted(id, l, r) as (
    select id, level, regexp_substr(s,'[^\^]+',1,level) from data
    connect by regexp_substr(s,'[^\^]+',1,level) is not null and id = prior id and prior sys_guid() is not null
)
select id, listagg(distinct r, '^') within group(order by l) as r  from splitted
group by id
;

This is much less efficient than using a hierarchy inside a `MULTISET` (since it also has to generate a GUID for each row and correlate the levels of the hierarchy) which, in turn, is less efficient than using a recursive query and simple string functions (see [this answer](https://stackoverflow.com/q/38371989/1509264) for relative performance) — MT0, Nov 29 '22 at 09:36

score 0 · Answer 4 · answered Nov 29 '22 at 11:00

For the fun, using XML:

with data(s) as (
    select 'ab^bc^ab^de' from dual union all
    select 'mn^mn^op' from dual
)
select * 
from data, 
    xmltable( 
        q'{string-join( for $atom in distinct-values((ora:tokenize($X,"\^"))) order by $atom return $atom, "^" )}'
        passing s as "X"
        columns 
            column_value varchar2(64) path '.'
        )
;

(or fn:tokenize, depending on the DB version)

Remove duplicate from strings in sql

4 Answers4