The "paragraphs" of interest can be obtained by matching the following regular expression.
^\d+\.\d+\s(?:(?!^\d+\.\d+\s).)*\bTEST\b(?:(?!^\d+\.\d+\s).)*
with the following flags:
g
: "global", do not return after the first match
m
: "multiline", causing '^' and '$' to respectively match the beginning of a line (as opposed to matching the beginning and end of the string)
s
: "single-line mode", .
matches all characters, including line terminators
Demo
The expression can be broken down as follows.
^ # match beginning of a line
\d+\.\d+\s # match 1+ digits then '.' then 1+ digits then a whitespace
(?: # begin a non-capture group
(?! # begin a negative lookahead
^ # match beginning of a line
\d+\.\d+\s # match 1+ digits then '.' then 1+ digits then a whitespace
) # end the negative lookahead
. # match any character, including line terminators
) # end non-capture group
* # execute the non-capture group 0+ times
\bTEST\b # match 'TEST' with word breaks on both sides
(?: # begin a non-capture group
(?! # begin a negative lookahead
^ # match beginning of a line
\d+\.\d+\s # match 1+ digits then '.' then 1+ digits then a whitespace
) # end the negative lookahead
. # match any character, including line terminators
) # end non-capture group
* # execute the non-capture group 0+ times
The technique of matching one character at a time with a negative lookahead (here (?:(?!^\d+\.\d+\s).)
) is called the tempered greedy token solution.
Note that there is quite a bit of duplication in this regular expression. Many regex engines permit the use of subroutines (or subexpressions) to reduce the duplication. With the PCRE engine (which I used at the "Demo" link), for example, you could write
(^\d+\.\d+\s)((?:(?!(?1)).)*)\bTEST\b(?2)
Demo
Here (?1)
is replaced by the expression for capture group 1, ^\d+\.\d+\s
and (?2)
is replaced by the expression for capture group 2, (?:(?!(?1)).)*
.
This is perhaps more clear if we used named capture groups.
(?P<float>^\d+\.\d+\s)(?P<beforeTEST>(?:(?!(?P>float)).)*)\bTEST\b(?P>beforeTEST)
Demo
One advantage of the use of subroutines is that it avoids some cut-and-paste copying errors.