0

I want to prepare regex to match paragraph containing specified word (the Agency). Currently my regex is match text too early. I think that i should use [^] somehow but I have no idea how. Can you help me with that?

\n(\d+.\s+.?the Agency.?)(\n\d+.\s) (https://regex101.com/r/osJPVK/1) I want to match text from "3". to "4." because it contains "the Agency" phrase.

where sucts. 
1. The objectives of the STh other arrangements are not inconsistent or in conflict with this licence 
or the STC or other relevant statutory requirements. 
3. The objectives of the STC referred to in sub-paragraph 1(c) are the:
(a) efficient discharge of the obligations imposed upon transmission licensees by 
transmission licences and the Act;
(b) development, maintenance and operation of an efficient, economical consistent therewith) facilitating such competition in the sion Licence: Standard Conditions – 1 April 2022
91
(g) compliance with the Electricity Regulation and any relevant legally binding 
decision of the European Commission and/or the Agency.
4. The STC shall provide for:
(a) there to be referred to the Authority for determination such matters arising under 
the STC as may be specified in the STC; 
(b) a copy of the STC or any part(s) thereof

\n(\d+.\s+.?the Agency.?)(\n\d+.\s) (https://regex101.com/r/osJPVK/1)

  • It is going to be rather ugly, `^\d+\.\s.*(?:\n(?!\d+\.\s).*)*\n.*?the Agency.*(?:\n(?!\d+\.\s).*)*`, see https://regex101.com/r/osJPVK/2. Remember to remove the `s` flag with this pattern. – Wiktor Stribiżew Feb 19 '23 at 18:40
  • Hi. It works! Can you elaborate more about what and why? It would be great but not necessary. :) – Wojciech Rogman Feb 19 '23 at 18:54

2 Answers2

1

Looking into the future, this question is very similar to this one. You can match the following regular expression.

^\d+\.\s+(?:(?!^\d+\.\s).)*\bthe Agency\b(?:(?!^\d+\.\s).)*

with the following flags:

  • g: "global", do not return after the first match
  • m: "multiline", causing '^' and '$' to respectively match the beginning of a line (as opposed to matching the beginning and end of the string)
  • s: "single-line mode", . matches all characters, including line terminators

Demo

The expression can be broken down as follows.

^                # match beginning of a line
\d+\.\s+         # match 1+ digits then '.' then 1+ whitespaces
(?:              # begin a non-capture group
  (?!            # begin a negative lookahead
    ^            # match beginning of a line
    \d+\.\s+     # match 1+ digits then '.' then 1+ whitespaces 
  )              # end the negative lookahead
  .              # match any character, including line terminators
)                # end non-capture group
*                # execute the non-capture group 0+ times
\bthe Agency\b   # match 'the Agency' with word breaks on both sides
(?:              # begin a non-capture group
  (?!            # begin a negative lookahead
    ^            # match beginning of a line
    \d+\.\s+     # match 1+ digits then '.' then 1+ whitespaces 
  )              # end the negative lookahead
  .              # match any character, including line terminators
)                # end non-capture group
*                # execute the non-capture group 0+ times
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
-1

^\d+\.((?!^\d+\.).)*the Agency((?!^\d+\.).)*

The ^ is just to say our match has to be at the beginning of a line.

The tricky part is this:

  • ((?!^\d+\.).)* : (?!...) says to not look for token ... in the next token. Here is a nicely detailed answer. Here we basically say to not have any beginning of line followed by "\d+\." in the match

https://regex101.com/r/tW3SyX/1

edd
  • 49
  • 5