Regex for parsing blocks of file

Question

I have a file formatted:

BEGIN
   xxx
END;
BEGIN
   xxx
EXCEPTION
   xxx
END;
BEGIN
   xxx
EXCEPTION
   xxx
END;

What i need is only the data between only BEGIN and EXCEPTION block and ignore the BEGIN-END and EXCEPTION-END block. I have created a regex but its not giving me the desired output:

body=re.findall(r'BEGIN.*^[^BEGIN].*EXCEPTION', data, re.MULTILINE|re.DOTALL)

Also I want to remove the BEGIN and EXCEPTION from the output, can I do it through regex. Alternatively I can use replace function too. Pls help.

Something like [`(?m)^BEGIN(?:\n(?!BEGIN$).*)*\nEXCEPTION$`](https://regex101.com/r/RWtnBz/4)? You cannot use `re.DOTALL` with this "unrolled" expression. — Wiktor Stribiżew, Jan 30 '18 at 13:37

Gurmanjot Singh · Accepted Answer · 2018-01-30T13:43:38.403

3

Try this Regex:

BEGIN(?:(?!END)[\s\S])*EXCEPTION

Click for Demo

OR

(?<=BEGIN)(?:(?!END)[\s\S])*(?=EXCEPTION)

Click for Demo

Explanation(1st Regex):

BEGIN - matches BEGIN
(?:(?!END)[\s\S])* - tempered greedy token to match 0+ occurrences of any character that does not start with END
EXCEPTION - matches EXCEPTION

edited Jan 30 '18 at 13:43

answered Jan 30 '18 at 13:38

Gurmanjot Singh

10,224
2
19
43

@Sarthak [They work to some extent](https://regex101.com/r/sD1fTZ/1) only. – Wiktor Stribiżew Jan 30 '18 at 13:43
@WiktorStribiżew Thanks for pointing that out. Would this work [`BEGIN(?:(?!\bEND\b)[\s\S])*EXCEPTION`](https://regex101.com/r/sD1fTZ/2) in that case? – Gurmanjot Singh Jan 30 '18 at 13:47
@Gurman Probably. If there are no string literals or comments inside with `END`. That is why I anchored BEGIN and END at line start/end. – Wiktor Stribiżew Jan 30 '18 at 13:49

Regex for parsing blocks of file

1 Answers1