1

I've searched around a bit, but it's hard to tell if this exact question has been answered before. I know you'll let me know if this is a duplicate.

I have a regular expression that matches a series of one or more positive integers preceded by a backslash. For example: \12345 would match, but \1234f or 12345 would not match.

The regex I'm using is ^\\(\d+)$

When I test the expression using various testers it works. For example, see: http://regex101.com/r/cY2bI1/1

However, when I implement it in the following c# code, I fail to get a match.

The implementation:

public string ParseRawUrlAsAssetNumber(string rawUrl) {
    var result = string.Empty;
    const string expression = @"^\\([0-9]+)$";
    var rx = new Regex(expression);
    var matches = rx.Matches(rawUrl);
    if (matches.Count > 0)
    {
        result = matches[0].Value;
    }
    return result;
}

The failing test (NUnit):

[Test]
public void ParseRawUrlAsAssetNumber_Normally_ParsesTheUrl() {
    var f = new Forwarder();
    var validRawUrl = @"\12345";
    var actualResult = f.ParseRawUrlAsAssetNumber(validRawUrl);
    var expectedResult = "12345";
    Assert.AreEqual(expectedResult, actualResult);
}

The test's output:

Expected string length 5 but was 6. Strings differ at index 0.
Expected: "12345"
But was:  "\\12345"
-----------^

Any ideas?

Resolution:

Thanks everyone for the input. In the end I took the following route based on your recommendations, and it is passing tests now.

public string ParseRawUrlAsAssetNumber(string rawUrl)
{
    var result = string.Empty;
    const string expression = @"^\\([0-9]+)$";
    var rx = new Regex(expression);
    var matches = rx.Matches(rawUrl);
    if (matches.Count > 0)
    {
        result = matches[0].Groups[1].Value;
    }
    return result;
}
Joe
  • 776
  • 7
  • 20

3 Answers3

7

The problem is this line:

var rx = new Regex(Regex.Escape(expression));

By escaping your expression you're turning all your special regex characters into literals. Calling Regex.Escape(@"^\\(\d+)$") will return "\^\\\\\(\\d\+\)\$", which will only match the literal string "^\\(\d+)$"

Try just this:

var rx = new Regex(expression);

See MSDN: Regex.Escape for a full explanation and examples of how this method is intended to be used.


Given your updated question, it seems like you also have an issue here:

result = matches[0].Value;

This will return the entire matched substring, not just the first capture group. For that you'll have to use:

result = matches[0].Groups[1].Value;
p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
  • Oops, that was an old version of the code. I'll make an edit and show the current version (the problem still exists). – Joe Jul 10 '14 at 22:03
  • See my updated answer. Also, while it's fine to edit the question to provide more information, please try not to edit it in a way that largely invalidates the answers. If you must, post the updated code under the original code and clearly mark it as an edit. – p.s.w.g Jul 10 '14 at 22:10
  • Ah, very good. I wasn't aware that matches[0] would return the full substring. Thanks. – Joe Jul 10 '14 at 22:13
  • Good point about editing the post in that way. I'll avoid that in the future. – Joe Jul 10 '14 at 22:14
  • Maybe I'm missing something. But your solution will throw an ArgumentOutOfRange exception. Only one match will be here. See my answer - OP should use Group value instead – Sergey Berezovskiy Jul 10 '14 at 22:15
  • 1
    @SergeyBerezovskiy Yes, I noticed the error after posting my initial update. Please check again. – p.s.w.g Jul 10 '14 at 22:16
3

Don't escape pattern. Also simply use Regex.Match thus you are going to have single match here. Use Match.Success to check if input matched your pattern. And return group value - digits are in group of your matched expression:

public string ParseRawUrlAsAssetNumber(string rawUrl)
{            
    const string pattern = @"^\\(\d+)$";

    var match = Regex.Match(rawUrl, pattern);
    if (!match.Success)
        return String.Empty;

    return match.Groups[1].Value;
}
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
  • 2
    +1 For the clean re-write, but the `if` is not necessary either. Even if it fails to match `.Groups[1].Value` will be an empty string. – p.s.w.g Jul 10 '14 at 22:18
  • 2
    @p.s.w.g thanks! Didn't know about this trick with empty group returned by default. Very cool. Whole method can be written as `return Regex.Match(rawUrl, @"^\\(\d+)$").Groups[1].Value` – Sergey Berezovskiy Jul 10 '14 at 22:43
1

What if you try get the group result instead?

match.Groups[1].Value

When I get to a real computer I'll test, but seems like it should work

Kyle Gobel
  • 5,530
  • 9
  • 45
  • 68
  • Yes, it should be the solution http://stackoverflow.com/questions/6375873/regular-expression-groups-in-c-sharp match.Groups[0] is always the same as match.Value, which is the entire match. match.Groups[1] is the first capturing group in your regular expression. – sarh Jul 10 '14 at 22:09