Python regexp everything between parenthesis on multiple lines

Question

My regexp is:

TMP_REGEXP = r'_\(\s*(.*)\s*\)\s*$'
TMP_PATTERN = re.compile(TMP_REGEXP, re.MULTILINE)

File input_data.txt:

print _(
    'Test #A'
    )              

print _(
    '''Test #B'''
    '''Test #C'''
)

I am running this like that:

with codecs.open('input_data.txt', encoding='utf-8') as flp:
    content = flp.read()

extracted = re.findall(TMP_PATTERN, content)

What I want to achieve is: - take all characters that follow '_(' - end reading characters if there is ')' followed by zero or more whitespaces and end of line

What is interesting 'Test #A' works like a charm bu 'Test #B' is skipped.

Duplicate of http://stackoverflow.com/questions/587345/python-regular-expression-matching-a-multiline-block-of-text — Elixir Techne, Jun 08 '16 at 16:47

erip · Accepted Answer · 2016-06-08T19:00:42.363

4

This worked for me:

m = re.findall(r'(?s)_\((.*?)\)', content)

(?s) looks for anything (including newlines).

_\( matches your desired start.

(.*?) looks for something.

\) matches your end.

You might want $ at the end and to do some stripping.

>>> content = """
... print _(
...     'Test #A'
...     )              
... 
... print _(
...     '''Test #B'''
...     '''Test #C'''
... )
... """
>>> import re
>>> m = re.findall(r'(?s)_\((.*?)\)', content)
>>> for i, match in enumerate(m, 1):
...     print("Match {0}: {1}".format(i, match))
... 
Match 1: 
    'Test #A'

Match 2: 
    '''Test #B'''
    '''Test #C'''

>>>

edited Jun 08 '16 at 19:00

answered Jun 08 '16 at 16:53

erip

16,374
11
66
121

There is one twist to figure out, what if there is Test #B ('a', 'b')\n Test #C, that is why I want to end reading when there is ) followed by nothing. But overall I am step closer, thank you. – Drachenfels Jun 08 '16 at 17:00
@Drachenfels Then you can't do it with a regex. That is not a regular language, so a regex cannot match that language. – erip Jun 08 '16 at 17:01
1

That is __not__ a lookbehind. – Kenneth K. Jun 08 '16 at 17:02
1

Personally, I would say exactly what [the documentation](https://docs.python.org/2/library/re.html#re.S) says: `Make the '.' special character match any character at all, including a newline`. – Kenneth K. Jun 08 '16 at 17:06

Python regexp everything between parenthesis on multiple lines

1 Answers1