-1

I would like to extract the number nearest to a section. In this regex \d+?[\r\n]+(.*)3.2.P.4.4.\s+Justification\s+of\s+Specifications

Objective - Trying to find a section that starts with a number and ends with a given section name. In this case, the section name is ( 3.2.P.4.4. Justification of Specifications)

Actual Result - Regex matches all content since the pattern starts with a number. Expected Result - Regex Should start from 29 which is the nearest number till the section. I tried numerous options like ungreedy quantifiers etc, but none seems to be working.

https://regex101.com/r/Othmck/2

Suresh Kumar
  • 333
  • 2
  • 10
  • Are you sure it is used in a .NET app? Regex101 does not support .NET regex. BTW, if you have a testing code snippet please add it to the question. Also, is the block you want to match always at the end of the string? If yes, the regex will be much simpler. – Wiktor Stribiżew Feb 24 '19 at 09:43

2 Answers2

1

You might use a negative lookahead to assert that the next line does not start with whitespace chars followed by digits and a newline:

^ \d+[\r\n](?:(?!\s+\d+[\r\n]).*[\r\n])*3\.2\.P\.4\.4\.\sJustification\s+of\s+Specifications

See a regex .NET demo | C# demo

Explanation

  • ^ Start of string
  • \d+[\r\n] Match space, 1+ digits and newline
  • (?: Non capturing group
    • (?! Negative lookahead to assert what follows is not
      • \s+\d+[\r\n] Match 1+ whitespace chars, 1+ digits and newline
    • ) Close negative lookahead
    • .*[\r\n] Match any char ending with a newline
  • )* Close non capturing group and repeat 0+ times
  • 3\.2\.P\.4\.4\.\sJustification\s+of\s+Specifications Match section name
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

In .NET, you may make use of RegexOptions.RightToLeft option to parse text from the end to its beginning, thus, getting last match much quicker and with simpler patterns.

Use

var text = " 26\r\nData related to the point SP-WFI-21-Room process fluids  \r\nSampling Date:16/04/2007 \r\n 28\r\nData related to pint SP-WFI-21-Room process fluids  \r\nSampling Date: 20/04/2007 \r\nTEST SPECIFICATIONS RESULTS \r\n 29\r\n3.2.P.4.2 Analytical Procedures \r\nAll the analytical procedures \r\n3.2.P.4.3 Validation of Analytical Procedures \r\nAll the analytical procedures proposed to control the excipients are those reported in Ph. Eur. \r\n− 3AQ13A: Validation of Analytical Procedures: Methodology - EUDRALEX Volume 3A \r\n3.2.P.4.4. Justification of Specifications";
var pattern = @"^\s*\d+\s*[\r\n]+(.*?)3\.2\.P\.4\.4\.\s+Justification\s+of\s+Specifications";
var regEx = new Regex(pattern, RegexOptions.RightToLeft | RegexOptions.Singleline | RegexOptions.Multiline );

var m = regEx.Match(text);
if (m.Success)
{
    Console.WriteLine(m.Groups[1].Value);
}

See the C# demo.

See the .NET regex demo

I basically just added ^ (in multiline mode, start of a line) and \s* after \d+ (just in case there are any spaces before the line break). Note the escaped dots.

Note that .NET regex does not support U greediness switching modifier, thus the +? must be turned to + and .* into .*?. Actually, there were + quantifiers that were meant to be +? in the original regex, which might have led to other errors or unexpected behavior. Do not use U modifier in PCRE if you are not 100% sure what you are doing.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563