6

Code

using System;
using System.Text.RegularExpressions;

namespace RegexNoMatch {
    class Program {
        static void Main () {
            string input = "a foobar& b";
            string regex1 = "(foobar|foo)&?";
            string regex2 = "(foo|foobar)&?";
            string replace = "$1";
            Console.WriteLine(Regex.Replace(input, regex1, replace));
            Console.WriteLine(Regex.Replace(input, regex2, replace));
            Console.ReadKey();
        }
    }
}

Expected output

a foobar b
a foobar b

Actual output

a foobar b
a foobar& b

Question

Why does replacing not work when the order of "foo" and "foobar" in regex pattern is changed? How to fix this?

Athari
  • 33,702
  • 16
  • 105
  • 146

1 Answers1

8

The regular expression engine tries to match the alternatives in the order in which they are specified. So when the pattern is (foo|foobar)&? it matches foo immediately and continues trying to find matches. The next bit of the input string is bar& b which cannot be matched.

In other words, because foo is part of foobar, there is no way (foo|foobar) will ever match foobar, since it will always match foo first.

Occasionally, this can be a very useful trick, actually. The pattern (o|a|(\w)) will allow you to capture \w and a or o differently:

Regex.Replace("a foobar& b", "(o|a|(\\w))", "$2") // fbr& b
p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
  • Why isn't regex greedy? I thought it is supposed to match the longest string it can match. – Athari Aug 02 '13 at 13:13
  • 1
    @Athari greediness applies to quantifiers, not to alternations. – p.s.w.g Aug 02 '13 at 13:14
  • Is there a way to force greediness for alternations, or do I have to sort alternations in reverse alphabetical order? – Athari Aug 02 '13 at 13:16
  • 1
    @Athari Alphabetical order makes no difference. Alternations should be sorted by the *broadest* pattern first, e.g. `foobar` is broader than `foo` because any string that matches `foo` will also match `foobar` (of course `foo(bar)?` is more sensible here). Unless you're trying to use the trick described in my updated answer. – p.s.w.g Aug 02 '13 at 13:22
  • In my real case, the list is quite long, so making regex more complex isn't worth it. I guess [this suggestion](http://stackoverflow.com/a/2395056/293099) solves my problem, as I need to match separate words. – Athari Aug 02 '13 at 13:27
  • @Athari Yes, if you want to match whole words only, `\b(…)\b` is the way to go. – p.s.w.g Aug 02 '13 at 13:33