0

I want to understand the following regex:

String.replaceAll("(.)(?=.*\\1)", "")

I understand the first part (.) is used to group a single character (anything). This is to create a back reference for \\1.

The other part (?=.*\\1)I am confused on. It says previous character 1 or 0 times followed by (not sure what = is), followed by any character 1 or more times, followed by a back reference.

If I input the following hello12hel it removes the duplicates. Can you explain how it is matching the duplicate?

Another question I had is that why a grouping is required on (?=.*\\1). Regex fails when it is not provided (i.e. if I do String.replaceAll((.)?=.*\\1) ). (Also why can't we use $1 instead of \\1?)

Help123
  • 1,511
  • 2
  • 28
  • 46
  • `(?=.*\\1)` is a look-ahead that checks whether there is the same character in the rest of the string, after you match and capture the character in `(.)`. Look-ahead doesn't consume the string, so the next match can occur on the next index, instead of where you find `\1`. With this method, all the instances of a character will be removed, except for the last instance. – nhahtdh Jul 21 '15 at 03:20
  • `$1` is the syntax in the replacement string. It can't be used in the regular expression. – nhahtdh Jul 21 '15 at 03:21
  • Thanks! Referred to the regex syntax on Stackoverflow and it had what you mentioned. – Help123 Jul 21 '15 at 04:01

0 Answers0