this is not homework. I'm just trying to learn/get better at regular expressions.
I'm trying to find 1 or more repeated words in a string. Actually, I'm trying to find 1 or more repeated words in a string and remove the repeats. I've looked at link1 and link2 and tried using their pattern(s) but they don't seem to work for me.
Here is what I have
String pattern = "\\b(\\w+)\\b\\s+\\1\\b";
Pattern p = Pattern.compile(pattern Pattern.CASE_INSENSITIVE);
//This is actually read from console
String input = "Goodbye bye bye world world world";
Matcher m = p.matcher(input);
while(m.fine())
{
System.out.println("group: " + m.group() + " start: " + m.start() + " end: " + m.end());
input = input.replaceAll(m.group(), m.group(1);
}
System.out.println(input);
And this is my output:
group: bye bye start: 8 end: 15 group(1): bye
group: world world start: 16 end: 27 group(1): world
Goodbye bye world world
What I'm expecting for the 2nd line of output is "group: world world world start: 16 end: 32.
So, to me, it seems like this is matching only the first repeated word. My understanding of the pattern is \b - word boundry, \w+ - on or more of the word (I'm not sure if it's the word repeated WITHOUT a space, i.e. 'wordword' or one or more of the word repeated WITH a space i.e' word word') then \b\s+ - followed by any white space \1 - the grouped word and finally \b - white space again.
Can some explain to me what's going on and what it should be?
Thanks!