9

I have a regex expression that I tested on http://gskinner.com/RegExr/ and it worked, but when I used it in my C# application it failed.

My regex expression: (?<!\d)\d{6}\K\d+(?=\d{4}(?!\d)) Text: 4000751111115425 Result: 111111

What is wrong with my regex expression?

RooiWillie
  • 2,198
  • 1
  • 30
  • 36
XandrUu
  • 1,169
  • 4
  • 26
  • 46
  • 3
    .NET's regular expression engine doesn't support `\K`. I believe the closest you can get is `(?<!\d)\d{6}(\d+)(?=\d{4}(?!\d))` and then look at `match.Groups[1].Value` rather than `match.Value`. (N.B. this would be an answer if a ♦ hadn't one-shot closed this question.) – Rawling Jan 07 '13 at 17:18
  • 3
    However I think you're overcomplicating things. `(?<=\d{6})\d+(?=\d{4})` should work as well because the `+` is greedy. – Rawling Jan 07 '13 at 20:33
  • \K is not supported by the .NET regex implementation: http://stackoverflow.com/q/13542950/736079 and this can indeed be solved by a look behind or by using a Capturing group. or – jessehouwing Jan 07 '13 at 20:58
  • 1
    @Rawling It appears you can post your answer as an answer now. – mickmackusa Nov 21 '17 at 02:30

2 Answers2

8

This issue you are having is that .NET regular expressions do not support \K, "discard what has been matched so far".

I believe your regex translates as "match any string of more than ten \d digits, to as many digits as possible, and discard the first 6 and the last 4".

I believe that the .NET-compliant regex

(?<=\d{6})\d+(?=\d{4})

achieves the same thing. Note that the negative lookahead/behind for no-more-\ds is not necessary as the \d+ is greedy - the engine already will try to match as many digits as possible.

Rawling
  • 49,248
  • 7
  • 89
  • 127
  • and what if I have this regex (will delete empty spaces from a tag) `\s\S*(?:

    |\G)(?:(?!

    ).)*?\s\K\s+|(?<=

    )\s+|\s+(?=

    )` ? What will be the \K alternative ?
    – Just Me Dec 02 '19 at 21:56
  • @JustMe I feel like you'd just put the bit before the `\K` into positive lookbehind, like the second alternate already uses (`(?<= ... )`) – Rawling Dec 03 '19 at 08:26
0

In general, \K operator (that discards all text matched so far from the match memory buffer) can be emulated with two techniques:

For example,

  • PCRE a+b+c+=\K\d+ (demo) = .NET (?<=a+b+c+=)\d+ or a+b+c+=(\d+) (and grab Group 1 value)
  • PCRE ^[^][]+\K.* (demo) = .NET (?<=^[^][]+)(?:\[.*)?$ (demo) or (better here) ^[^][]+(.*) (demo).

The problem with the second example is that [^][]+ can match the same text as .* (these patterns overlap) and since there is no clear boundary between the two patterns, just using a lookbehind is not actually working and needs additional tricks to make it work.

Capturing group approach is universal here and should work in all situations.

Since \K makes the regex engine "forget" the part of a match consumed so far, the best approach here is to use a capturing group to grab the part of a match you need to obtain after the left-hand context:

using System;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var text = "Text  4000751111115425";
        var result = Regex.Match(text, @"(?<!\d)\d{6}(\d+)(?=\d{4}(?!\d))")?.Groups[1].Value;
        Console.WriteLine($"Result: '{result}'");
    }
}

See the online C# demo and the regex demo (see Table tab for the proper result table). Details:

  • (?<!\d) - a left-hand digit boundary
  • \d{6} - six digits
  • (\d+) - Capturing group 1: one or more digits
  • (?=\d{4}(?!\d)) - a positive lookahead that matches a location that is immediately followed with four digits not immediately followed with another digit.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563