-1

Following two query gives some output,

 Select REGEXP_REPLACE('Milind,Milind,Gopal,Gopal,Gopal,Milind'), '([^,]+) (,\1)+(,|$)', '\1\3') "OUTPUT"
FROM dual;

o/p Milind,Gopal,Milind


Select REGEXP_REPLACE('Milind,Milind,Gopal,Gopal,Gopal,Milind'), '([^,]+) (,\1+)(,|$)', '\1\3') "OUTPUT"
FROM dual;

o/p Milind,Gopal,Gopal,Milind

I do not understand the workflow. How it is giving output. Can anyone help me out and explain.

Say_milli
  • 21
  • 3
  • The first RegEx removes *one* following name which matches the previous name. #2 removes *all* following names. you can check RegExes on rexex101.com to see how they work: https://regex101.com/r/Xutku1/1 – dnoeth Apr 09 '19 at 15:15
  • These patterns are regular NFA expressions. See https://regex101.com/r/USv2Et/1 and https://regex101.com/r/USv2Et/2 for explanation. The main thing here is a *repeated capturing group*, `(,\1)+`. You should avoid quantifying groups like this when you intend to quantify an atom inside the group. – Wiktor Stribiżew Apr 09 '19 at 17:38

1 Answers1

1

[^,] matches one character, that can be anything except a comma. For example: M.

[^,]+ matches 1 or more times any character except a comma. For example: Milind

([^,]+) same, but capture the result, so it can be referenced later with \1.

(,\1)+ matches a comma followed by the previous capture, one or more times, and capture the result as \2. If \1 is Milind, it can match: ,Milind, or ,Milind,Milind or ,Milind,Milind,Milind, etc.

(,|$) matches either a comma or the end of the line, and captures it as \3.

\1\3 This is the replacement pattern: we are only keeping \1 and \3, so everything matched in capture \2 is effectively removed.

The second statement is showing a small difference that introduces a bug:

(,\1+) matches a comma followed by one or more occurences of the previous capture, as in: ,Milind, or ,MilindMilind or ,MilindMilindMilind. As a result, it fails to remove several comma separated occurences.

Jerome
  • 2,350
  • 14
  • 25