2

I want to capture a whole line, and, optionally a ID, with the regex pattern H-\d{4}, like H-1234.

These are two sample lines, one with ID, the other without ID:

Sample line with H-5722 id

Sample line without id

In the first ALL should capture be the whole line, and ID H-5722. In the second ALL should capture the whole line, and ID should be empty.

This regex work for the first line, capturing ALL and ID:

^(?<ALL>.*?(?<ID>H-\d{4})\b.*)$

but it doesn't match the second line, as expected, because it doesn't have an ID.

So, I've tried to make the ID capture optional with a non-capturing group with ? zero-or-one modifier (?:(?<ID>H-\d{4}))?, or modified the ID group so that it can capture the expression or an empty string (?<ID>H-\d{4}|):

^(?<ALL>.*?(?:(?<ID>H-\d{4})\b)?.*)$

^(?<ALL>.*?(?<ID>H-\d{4}|)\b.*)$

With these modifications ALL capture the whole lines in both examples. But it doesn't capture the ID.

How can I achieve this?

I'm using .NET regex implementation, but I think it's very similar to other implementations.

Community
  • 1
  • 1
JotaBe
  • 38,030
  • 8
  • 98
  • 117

4 Answers4

1

Using alternations:

^(?<ALL>(?!.*H-\d{4}\b).*|.*?(?:(?<ID>H-\d{4})\b).*)$

See https://regex101.com/r/dZx3b1/1/

Alternatively using an unrolled tempered greedy token (for performance)

^(?<ALL>[^H\n]*(?:H(?!-\d{4}\b)[^H\n]*)*(?<ID>H-\d{4}\b)?.*)$

See https://regex101.com/r/9ILEhw/1/

Basically forcing the ID-group to be used, if it can be found.

Your approach fails, as .*? always matches the initial empty string, the optional ID-pattern is skipped and .* matches the actual string.

Sebastian Proske
  • 8,255
  • 2
  • 28
  • 37
  • I'm sorry: the idea and the explanation look fine, but it doesn't work. It needs some adjustment (I didn't downvote, however, to let you make the adjustment) – JotaBe Jan 17 '18 at 13:57
  • @JotaBe it's working fine in regex101 for me, see the links (unless I misunderstoof something). I also tested in http://regexstorm.net/tester with the same Results to confirm it's not an engine problem - however this site doesn't allow to link a sample. – Sebastian Proske Jan 17 '18 at 14:03
  • Thank you, Sebastian. Now it's working perfectly. Besides you explain why. – JotaBe Jan 17 '18 at 14:13
  • Just to let you know: `(?J)` at the start of a pattern enables using the same named groups in `PCRE` as well, see https://regex101.com/r/T2UXuY/2 – Jan Jan 17 '18 at 14:26
  • Who's talking about PCRE @Jan? – revo Jan 17 '18 at 14:29
  • @revo: Sebastian and I were talking about PCRE underneath my own answer - this has nothing to do with the actual question, though. – Jan Jan 17 '18 at 14:32
1

In .NET you can use

(?:(?<ALL>.*(?<ID>\bH-\d{4}\b).*)|(?<ALL>.+))

See a working demo on regex101.com.


Broken down, this says:
(?:                                 # open non-capturing group
   (?<ALL>.*(?<ID>\bH-\d{4}\b).*)   # with ID
   |                                # or
   (?<ALL>.+)                       # without ID
)

Whatever your content, ALL holds the complete line and ID is only present if there's indeed an ID of the form H-1234. As stated in the comments, this is only possible in .NET (see here on SO) and would be invalid syntax with PCRE and the like.

Jan
  • 42,290
  • 8
  • 54
  • 79
1

Try a more specific alternation:

^(?<ALL>[^H\n\r]*(?:(?<ID>H-\d{4}).*|.[^H\n\r]*)*)

Not the shortest but the fastest.

Live demo

revo
  • 47,783
  • 14
  • 74
  • 117
0

The following pattern seems to work:

^((?:(?!H-\d{4}).)*(H-\d{4})?\b.*)$

In the case where the H idea be present, it would be available in the second capture group. If not, then the second capture group would be empty. In either case, the entire string appears in the first capture group.

string input = "Sample line with H-123 id";
Regex r1 = new Regex(@"^((?:(?!H-\d{4}).)*(H-\d{4})?\b.*)$");
Match match = r1.Match(input);
if (match.Success)
{
    Console.WriteLine("First capture group: {0}", match.Groups[1].Value);
    Console.WriteLine("Second capture group: {0}", match.Groups[2].Value);
}

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360