1

With the following RegEx pattern:

(?<comment>(^#{2} [^\r\n]+[\s]+)*)(?:^\[(?:(?<hive>HK(?:LM|[DP]D|C[CUR]|U(SERS|SER|SR|S)))[:])?(?<name>[a-z0-9$][a-z0-9-_]{2,63})\])(?<items>[\S\s]*?)(?=\n{2,})

Parsing the following text:

[HKLM:Connection]
   AuthKey = 0x8a79b42z67fct29b42e07b3fd78nc540
   Url = https://dev.somewebsite.com
   ApiPath = /api/

[HKLM:Settings]
   AutoMinimizeConsole = no
   StyleFile = Default
   PhoneNbrs = [+]?[01]{0,3}[-. ]?[(]?[0-9][0-9][0-9][)]?[-. ]?[0-9][0-9][0-9][-. ]?[0-9][0-9][0-9][0-9]
   PostalCodes = [ABCEGHJKLMNPRSTVXYabceghjklmnprstvxy][0-9][ABCEGHJKLMNPRSTVWXYZabceghjklmnprstvwxyz][\s.-]?[0-9][ABCEGHJKLMNPRSTVWXYZabceghjklmnprstvwxyz][0-9]

[HKLM:Font-Mapping]
   MonoSpaced = Courier New
   User1 = Software Tester 7
   User2 = Repetition Scrolling
   User3 = basis333

[HKLM:UserInterface]

[HKCU:UserInterface]

[HKCU:Credentials]
   Username =
   Password? =

When entered into online Regex tests, the results come out as expected, but in code, no matches are found. The "data" variable used here is populated with the text provided above prior to this segment:

public const string GROUP_PATTERN = @"(?<comment>(^#{2} [^\r\n]+[\s]+)*)(?:^\[(?:(?<hive>HK(?:LM|[DP]D|C[CUR]|U(SERS|SER|SR|S)))[:])?(?<name>[a-z0-9$][a-z0-9-_]{2,63})\])(?<items>[\S\s]*?)(?=\n{2,})";
Regex groupParser = new Regex(GROUP_PATTERN, RegexOptions.Compiled | RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase | RegexOptions.Multiline);
MatchCollection matches = groupParser.Matches(data);
foreach (Match m in matches)
    this.Add(IniGroupItem.Parse(m.Value));

At the inception of the ForEach, there are zero matches (should be six!)..

Since the pattern works on test sites, but not at all in c#, I don't know how to figure out what issues the compiler is having with it. Any insights / suggestions?

NetXpert
  • 511
  • 5
  • 14
  • 1
    Looks like your file has got CRLF endings, try replacing `(?=\n{2,})` with `(?=(?:\r\n){2,})` – Wiktor Stribiżew Mar 30 '19 at 17:38
  • *doh*! -- okay, I went and replaced "(?=\n{2,})" with "(?=[\r\n]{3,})" and it's working as intended! If you post that as an answer I'll give you the credit! :) – NetXpert Mar 30 '19 at 17:48

1 Answers1

2

The line endings in the majority online regex testers are LF only. Had you tested your .NET regex at the RegexStorm .NET regex tester you would have identified the issue quicker since its line endings are CRLF.

So, the problem is with (?=\n{2,}) as it requires a newline to repeat 2 or more times. Since there are two or more sequences of \r\n in the actual data you need to replace that pattern part with (?=(?:\r\n){2,}).

If you say (?=[\r\n]{3,}) works for you, it means you want to match a location followed with 3 or more LF or CR chars.

In mixed cases, if you want to match a place followed with 2 or more CLF or LF line break sequences, you may use (?=(?>\r\n?|\n){2,}).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • As I mentioned earlier, I went with (?=[\r\n]{3,}) which doesn't care about the sequence or demand the presence of both characters. Is there a reason that this way is an inferior method to yours? – NetXpert Mar 30 '19 at 17:58
  • 1
    @BrettLeuszler It all depends on your requirements. If you want to match a location followed with 3 or more LF or CR chars, use `(?=[\r\n]{3,})`. If you want to match a place followed with 2 or more line break sequences, `(?=(?>\r\n?|\n){2,})` is more precise – Wiktor Stribiżew Mar 30 '19 at 18:01
  • My concern with regard to using "\r\n" as a strict literal is that I have had instances where incoming text has had the order (annoyingly!) reversed (LFCR instead of CRLF). Since then I've preferred "[\r\n]{2}" instead of the strict literal "\r\n", so that's why I reflexively went back to that pattern. – NetXpert Mar 30 '19 at 19:41
  • Also, also (lol) -- I predominantly use https://regex101.com/ for regex testing mainly because of the intense amount of diagnostics it makes available for really drilling down into the pattern's syntax, and the amount of readily accessible help it gives. Obviously, though, it clearly has this issue you identified wrt CRLF vs LF when parsing the source text, so it's good to have another resource to check against! – NetXpert Mar 30 '19 at 19:45