Explanation in REGEXP_REPLACE function output

Question

Following two query gives some output,

 Select REGEXP_REPLACE('Milind,Milind,Gopal,Gopal,Gopal,Milind'), '([^,]+) (,\1)+(,|$)', '\1\3') "OUTPUT"
FROM dual;

o/p Milind,Gopal,Milind


Select REGEXP_REPLACE('Milind,Milind,Gopal,Gopal,Gopal,Milind'), '([^,]+) (,\1+)(,|$)', '\1\3') "OUTPUT"
FROM dual;

o/p Milind,Gopal,Gopal,Milind

I do not understand the workflow. How it is giving output. Can anyone help me out and explain.

The first RegEx removes *one* following name which matches the previous name. #2 removes *all* following names. you can check RegExes on rexex101.com to see how they work: https://regex101.com/r/Xutku1/1 — dnoeth, Apr 09 '19 at 15:15
These patterns are regular NFA expressions. See https://regex101.com/r/USv2Et/1 and https://regex101.com/r/USv2Et/2 for explanation. The main thing here is a *repeated capturing group*, `(,\1)+`. You should avoid quantifying groups like this when you intend to quantify an atom inside the group. — Wiktor Stribiżew, Apr 09 '19 at 17:38

score 1 · Answer 1 · answered Apr 09 '19 at 15:55

[^,] matches one character, that can be anything except a comma. For example: M.

[^,]+ matches 1 or more times any character except a comma. For example: Milind

([^,]+) same, but capture the result, so it can be referenced later with \1.

(,\1)+ matches a comma followed by the previous capture, one or more times, and capture the result as \2. If \1 is Milind, it can match: ,Milind, or ,Milind,Milind or ,Milind,Milind,Milind, etc.

(,|$) matches either a comma or the end of the line, and captures it as \3.

\1\3 This is the replacement pattern: we are only keeping \1 and \3, so everything matched in capture \2 is effectively removed.

The second statement is showing a small difference that introduces a bug:

(,\1+) matches a comma followed by one or more occurences of the previous capture, as in: ,Milind, or ,MilindMilind or ,MilindMilindMilind. As a result, it fails to remove several comma separated occurences.

Thanks Jerome. Helped a lot in seeing and understanding the workflow. You are awesome. — Say_milli, Apr 10 '19 at 09:03

Explanation in REGEXP_REPLACE function output

1 Answers1