34

Say for example I have the following string "one two(three) (three) four five" and I want to replace "(three)" with "(four)" but not within words. How would I do it?

Basically I want to do a regex replace and end up with the following string:

"one two(three) (four) four five"

I have tried the following regex but it doesn't work:

@"\b\(three\)\b"

Basically I am writing some search and replace code and am giving the user the usual options to match case, match whole word etc. In this instance the user has chosen to match whole words but I don't know what the text being searched for will be.

CroweMan
  • 343
  • 1
  • 3
  • 5
  • Anything either side of a ( or ) will automatically be a word boundary, because it's not in between two word characters – Gareth Aug 12 '10 at 13:48

5 Answers5

68

Your problem stems from a misunderstanding of what \b actually means. Admittedly, it is not obvious.

The reason \b\(three\)\b doesn’t match the threes in your input string is the following:

  • \b means: the boundary between a word character and a non-word character.
  • Letters (e.g. a-z) are considered word characters.
  • Punctuation marks such as ( are considered non-word characters.

Here is your input string again, stretched out a bit, and I’ve marked the places where \b matches:

 o n e   t w o ( t h r e e )   ( t h r e e )   f o u r   f i v e
↑     ↑ ↑     ↑ ↑         ↑     ↑         ↑   ↑       ↑ ↑       ↑

As you can see here, there is a \b between “two” and “(three)”, but not before the second “(three)”.

The moral of the story? “Whole-word search” doesn’t really make much sense if what you’re searching for is not just a word (a string of letters). Since you have punctuation characters (parentheses) in your search string, it is not as such a “word”. If you searched for a word consisting only of word characters, then \b would do what you expect.

You can, of course, use a different Regex to match the string only if it surrounded by spaces or occurs at the beginning or end of the string:

(^|\s)\(three\)(\s|$)

However, the problem with this is, of course, that if you search for “three” (without the parentheses), it won’t find the one in “(three)” because it doesn’t have spaces around it, even though it is actually a whole word.

I think most text editors (including Visual Studio) will use \b only if your search string actually starts and/or ends with a word character:

var pattern = Regex.Escape(searchString);
if (Regex.IsMatch(searchString, @"^\w"))
    pattern = @"\b" + pattern;
if (Regex.IsMatch(searchString, @"\w$"))
    pattern = pattern + @"\b";

That way they will find “(three)” even if you select “whole words only”.

Timwi
  • 65,159
  • 33
  • 165
  • 230
  • It possibly doesn't make sense but that is how I would like it to work. Have you got any ideas how I could do this? Basically I would like to mimick the find and replace functionality within visual studio. – CroweMan Aug 12 '10 at 13:46
  • @CroweMan: You are contradicting yourself. You said, “I don't want "two(three)" to be replaced”, but Visual Studio does. – Timwi Aug 12 '10 at 13:52
  • Thank you very much. You are a star! – CroweMan Aug 12 '10 at 13:55
  • 1
    Please [be careful](http://stackoverflow.com/questions/4213800/is-there-something-like-a-counter-variable-in-regular-expression-replace/4214173#4214173) of `\b` style boundaries. – tchrist Nov 18 '10 at 16:18
9

Here a simple code you may be interested in:

    string pattern = @"\b" + find + @"\b";
    Regex.Replace(stringToSearch, pattern, replace, RegexOptions.IgnoreCase);

Source code: snip2code - C#: Replace an exact word in a sentence

Dominique Terrs
  • 609
  • 8
  • 5
  • NOTE: As mentioned in the accepted answer, this will only work as expected if `find` is a string of ONLY "word" (`\w`) characters. It won't work for this question, where find is `(three)`. – ToolmakerSteve Dec 15 '22 at 22:48
1

See what a word boundary matches:

A word boundary can occur in one of three positions:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

So, your \b\(three\)\b regex DOES work, but NOT the way you expected. It does not match (three) in In (three) years, In(three) years and In (three)years, but it matches in In(three)years because there are word boundaries between n and ( and between ) and y.

What you can do in these situations is use dynamic adaptive word boundaries that are constructs that ensure whole word matching where they are expected only (see my "Dynamic adaptive word boundaries" YT video for better visual understanding of these constructs).

In C#, it can be written as

@"(?!\B\w)\(three\)(?<!\w\B)"

In short:

  • (?!\B\w) - only require a word boundary on the left if the char that follows the word boundary is a word char
  • \(three\)
  • (?<!\w\B) - only require a word boundary on the right if the char that precedes the word boundary is a word char.

In case your search phrases can contain whitespaces and you need to match the longer alternatives first you can build the pattern dynamically from a list like

var phrases = new List<string> { @"(one)", @".two.", "[three]" };
phrases = phrases.OrderByDescending(x => x.Length).ToList();
var pattern = $@"(?!\B\w)(?:{string.Join("|", phrases.Select(z => Regex.Escape(z)))})(?<!\w\B)";

with the resulting pattern like (?!\B\w)(?:\[three]|\(one\)|\.two\.)(?<!\w\B) that matches what you'd expect, see the C# demo and the regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

I recently came across a similar issue in javascript trying to match terms with a leading '$' character only as separate words, e.g. if $hot = 'FUZZ', then:

"some $hot $hotel bird$hot pellets" ---> "some FUZZ $hotel bird$hot pellets"

The regex /\b\$hot\b/g (my first guess) did not work for the same reason the parens did not match in the original question — as non word characters, there is no word/non-word boundary preceding them with whitespace or a string start.

However the regex /\B\$hot\b/g does match, which shows that the positions not marked in @timwi's excellent example match the \B term. This was not intuitive to me because ") (" is not made of regex word characters. But I guess since \B is an inversion of the \b class, it doesn't have to be word characters, it just has to be not- not- word characters :)

jongala
  • 497
  • 6
  • 7
-1

As Gopi said, but (theoretically) catching only (three) not two(three):

string input = "one two(three) (three) four five";

string output = input.Replace(" (three) ", " (four) ");

When I test that, I get: "one two(three) (four) four five" Just remember that white-space is a string character, too, so it can also be replaced. If I did this:

//use same input
string output = input.Replace(" ", ";");

I'd get one;two(three);(three);four;five"

AllenG
  • 8,112
  • 29
  • 40
  • The problem is that the user is entering the text in a find and replace box and they have selected 'match whole words'. So I need to use something inteligent like regular expressions and I can't just add a " " before or after the expression as the character proceding could be a ',' or something else – CroweMan Aug 12 '10 at 13:45