1

I have a file formatted:

BEGIN
   xxx
END;
BEGIN
   xxx
EXCEPTION
   xxx
END;
BEGIN
   xxx
EXCEPTION
   xxx
END;

What i need is only the data between only BEGIN and EXCEPTION block and ignore the BEGIN-END and EXCEPTION-END block. I have created a regex but its not giving me the desired output:

body=re.findall(r'BEGIN.*^[^BEGIN].*EXCEPTION', data, re.MULTILINE|re.DOTALL)

Also I want to remove the BEGIN and EXCEPTION from the output, can I do it through regex. Alternatively I can use replace function too. Pls help.

Sarthak
  • 1,076
  • 1
  • 12
  • 20
  • 1
    Something like [`(?m)^BEGIN(?:\n(?!BEGIN$).*)*\nEXCEPTION$`](https://regex101.com/r/RWtnBz/4)? You cannot use `re.DOTALL` with this "unrolled" expression. – Wiktor Stribiżew Jan 30 '18 at 13:37

1 Answers1

3

Try this Regex:

BEGIN(?:(?!END)[\s\S])*EXCEPTION

Click for Demo

OR

(?<=BEGIN)(?:(?!END)[\s\S])*(?=EXCEPTION)

Click for Demo

Explanation(1st Regex):

  • BEGIN - matches BEGIN
  • (?:(?!END)[\s\S])* - tempered greedy token to match 0+ occurrences of any character that does not start with END
  • EXCEPTION - matches EXCEPTION
Gurmanjot Singh
  • 10,224
  • 2
  • 19
  • 43