It all depends on what you understan to be a "word". Perhaps you'd better define what you understand to be a word delimiter: for example, blanks, commas .... And write something as
phrase=phrase.replaceAll("([ \\s,.;])" + Pattern.quote(word)+ "([ \\s,.;])","$1$2");
But you'll have to check additionally for occurrences at the start and the end of the string
For example:
String phrase="bob has a bike bob, bob and boba bob's bike is red and \"bob\" stuff.";
String word="bob";
phrase=phrase.replaceAll("([\\s,.;])" + Pattern.quote(word) + "([\\s,.;])","$1$2");
System.out.println(phrase);
prints this
bob has a bike , and boba bob's bike is red and "bob" stuff.
Update: If you insist in using \b
, considering that the "word boundary" understand Unicode, you can also do this dirty trick: replace all ocurrences of '
by some Unicode letter that you're are sure will not appear in your text, and afterwards do the reverse replacemente. Example:
String phrase="bob has a bike bob, bob and boba bob's bike is red and \"bob\" stuff.";
String word="bob";
phrase= phrase.replace("'","ñ").replace('"','ö');
phrase=phrase.replaceAll("\\b" + Pattern.quote(word) + "\\b","");
phrase= phrase.replace('ö','"').replace("ñ","'");
System.out.println(phrase);
UPDATE: To summarize some comments below: one would expect \w
and \b
to have the same notion as to which is a "word character", as almost every regular-expression dialect do. Well, Java does not: \w
considers ASCII, \b
considers Unicode. It's an ugly inconsistence, I agree.
Update 2: Since Java 7 (as pointed out in comments) the UNICODE_CHARACTER_CLASS flag allows to specify a consistent Unicode-only behaviour, see eg here.