Say you have a string, abcabc
, and you want to figure out whether the first part of the string matches the second part. You can do this with a single regex by using capturing groups and backreferences. Here is the regex I would use:
(.+)\1
The way this works is .+
matches any sequence of characters. Because it is in parentheses, it is caught in a group. \1
is a backreference to the 1
st capturing group, so it is the equivalent of the text caught by the capturing group. After a bit of backtracking, the capturing group matches the first part of the string, abc
. The backreference \1
is now the equivalent of abc
, so it matches the second half of the string. The entire string is now matched, so it is confirmed that the first half of the string matches the second half.
Another use of backreferences is in replacing. Say you want to replace all {...}
with [...]
, if the text inside {
and }
is only digits. You can easily do this with capturing groups and backreferences, using the regex
{(\d+)}
And replacing with that with [\1]
.
The regex matches {123}
in the string abc {123} 456
, and captures 123
in the first capturing group. The backreference \1
is now the equivalent of 123
, so replacing {(\d+)}
in abc {123} 456
with [\1]
results in abc [123] 456
.
The reason non-capturing groups exist is because groups in general have more uses that just capturing. The regex (xyz)+
matches a string that consists entirely of the group, xyz
, repeated, such as xyzxyzxyz
. A group is needed because xyz+
only matches xy
and then z
repeated, i.e. xyzzzzz
. The problem with using capturing groups is that they are slightly less efficient compared to non-capturing groups, and they take up an index. If you have a complicated regex with a lot of groups in it, but you only need to reference a single one somewhere in the middle, it's a lot better to just reference \1
rather than trying to count all the groups up to the one you want.
I hope this helps!