0

Using Python, I need to parse a file with the following structure:

((Lorem) ipsum dolor sit amet)
(consectetur adipiscing elit.)(Etiam
suscipit
pulvinar congue.)
((Vivamus) eu faucibus enim.)

The result needs to be a list with the contents of everything in the brackets, i.e.,

[
    '(Lorem) ipsum dolor sit amet',
    'consectetur adipiscing elit.',
    'Etiam\nsuscipit\npulvinar congue.',
    '(Vivamus) eu faucibus enim.'
]

Since the brackets can be nested, perhaps regex is not the tool I'm looking for.

Any hints?

Nico Schlömer
  • 53,797
  • 27
  • 201
  • 249

3 Answers3

3

You can do it with a recursive regex:

\(((?:[^()]|(?R))*)\)

This is, almost exactly (except for an added capture group), the real-world example for recursive patterns on regular-expressions.info

Test it on regex101.com. It returns exactly your example output.

To implement the recursive regex, have a look at the answer to this question: How can a recursive regexp be implemented in python?

Community
  • 1
  • 1
Imanuel
  • 3,596
  • 4
  • 24
  • 46
0

I think i would code this by myself. I am far away from a python expert so maybe my solution isn't the normal python way. Initially set a counter to 0. Then step through the string char by char. If the current char equals '(' increase the counter by one. If it equals ')' decrease it. If your counter is 0 after decrease, you have your next list entry. If your counter gets below zero you have an error. Like if the next list entry does not start with '('. But that depends on what you want. That should be really simple to implement.

Schorsch
  • 319
  • 1
  • 14
0

All you need is a stack to implement this. Algo

  1. Start parsing the string pushing everything(except closing bracket) to a stack.
  2. As you see you are going to push a closing bracket starting poping elements from stack untill you get a opening bracket. So from closing bracket to opening bracket this is going to your elemnt of list. Do this untill you have parsed complete string.

Or you could do the reverse(start parsing string from end and push everything except opening bracket)

For reference have a look at this post interactivepython.org/runestone/static/pythonds/BasicaDS/InfixPrefixandPostfixExpressions.html

Postfix prefix methods are generally used for evaluating expressions.

Yank Leo
  • 452
  • 5
  • 19