I am working on a problem that removes duplicated words from a string. E.g.,
Input: Goodbye bye bye world world world
Output: Goodbye bye world
I have got a working pattern from online resources, but I am not able to understand all the content in it.
String pattern = "\\b(\\w+)(\\b\\W+\\b\\1\\b)*";
Here is my understanding:
- the initial
\\b
is to match word bounaries (\\w+)
matches one or more charactersin this expression :
(\\b\\W+\\b\\1\\b)*
a.
\\b
matches word boundariesb.
\\W+
matches one or more non-word charactersc.
\\b
again matches a word bounaryd.
\\1
??? I dont know what this is for, but it wont work without thisc.
\\b
again matches for a word bounary
As you can see, my main confusion is about item 3 and especially \\1
.
Anyone can explain it more clearly?