2

I'm working with C# and want to parse phone numbers from a string. I live in Switzerland, phone numbers can either have 10 digits like following pattern: 000 000 00 00 or can start with a +41: +41 00 000 00 00. I've written following regular expression:

var phone = new Regex(@"\b(\+41\s\d{2}|\d{3})\s?\d{3}\s?\d{2}\s?\d{2}\b");

This works perfectly fine with the first example, but the one with the "+41" doesn't match. I'm pretty sure there's a problem with the word boundary \b and the following +. When I remove the \b at the start it finds a match with the +41-example. My code:

    var phone = new Regex(@"\b(\+41\s\d{2}|\d{3})\s?\d{3}\s?\d{2}\s?\d{2}\b");

    var text = @"My first phonenumber is: +41 00 000 00 00. My second one is:
    000 000 00 00. End.";

    var phoneMatches = phone.Matches(text);
    foreach(var match in phoneMatches)
    {
        Console.WriteLine(match);
    }
    Console.ReadKey();

Output: 000 000 00 00.

Output without \b:

+41 00 000 00 00 000 000 00 00

Any solutions?

Damien Flury
  • 769
  • 10
  • 23

2 Answers2

2

You may use a (?<!\w) positive lookbehind instead of the first \b. Since the next expected character can be a non-word char, the word boundary may fail the match, and (?<!\w) will only fail the match once there is a word char before the next expected char.

Use

var phone = new Regex(@"(?<!\w)(\+41\s\d{2}|\d{3})\s?\d{3}\s?\d{2}\s?\d{2}\b");
                        ^^^^^^^

Details

  • (?<!\w) - fail the match if there is a word char immediately to the left of the current location
  • (\+41\s\d{2}|\d{3}) - +41, a whitespace and 2 digits, or 3 digits
  • \s? - 1 or 0 whitespaces
  • \d{3} - 3 digits
  • \s? - 1 or 0 whitespaces
  • \d{2} - 2 digits
  • \s? - 1 or 0 whitespaces
  • \d{2} - 2 digits
  • \b - a word boundary (this one will work since the previous expected char is a digit).

NOTE: To only match ASCII digits, you might want to replace \d with [0-9] (see this thread).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
-1

Try this one:

(\+\b41\s\d{2}|\b\d{3})\s?\d{3}\s?\d{2}\s?\d{2}\b

move the boundary separator inside the () block and place + to precede the word boundary separator.

Istvan
  • 613
  • 5
  • 5