Fully match multi lines in Python using Regex

Question

I am trying to extract the content that spans over multi lines. The content looks like this:

some content here
[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
 *something here
 *another something here
not relevant, should not be returned
[1/3/2015 - SSR] another one

There is always a space before the *

The code I am using is:

re.search(r'.*- SSR](.*)',line,re.DOTALL)

The expected output is:

[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
 *something here
 *another something here
[1/3/2015 - SSR] another one

However it only retrieve the first and the third record, not the second one. Since it ranges multilines. Can anybody help? I would really appreciate it.

http://stackoverflow.com/questions/587345/python-regular-expression-matching-a-multiline-block-of-text — rockerBOO, Mar 13 '15 at 18:55

Federico Piazza · Answer 1 · 2015-03-13T19:37:24.630

0

You can use a regex like this:

^.*?- SSR]([^[]*)

Working demo

enter image description here

Match information:

MATCH 1
1.  [34-45] ` something
`
MATCH 2
1.  [61-111]    ` another:
*something here
*another something here
`
MATCH 3
1.  [127-139]   ` another one`

You can use something like this:

import re
p = re.compile(ur'^\[.*?- SSR]([^[]*)', re.DOTALL | re.MULTILINE)
test_str = u"some content here\n[1/1/2015 - SSR] something\n[1/2/2015 - SSR] another:\n*something here\n*another something here\n[1/3/2015 - SSR] another one"

re.findall(p, test_str)

On the other hand, if you want to also capture the beginning of the string in the group, then you can use this regex:

^(\[.*?- SSR][^[]*)

Working demo

Match information:

MATCH 1
1.  [18-45] `[1/1/2015 - SSR] something
`
MATCH 2
1.  [45-111]    `[1/2/2015 - SSR] another:
*something here
*another something here
`
MATCH 3
1.  [111-139]   `[1/3/2015 - SSR] another one`

edited Mar 13 '15 at 19:37

answered Mar 13 '15 at 19:32

Federico Piazza

30,085
15
87
123

Thanks a lot! One thing: there will always be a space before the *. So it appears the lines start with * wasn't returned. How do I modify the regex? – user3238319 Mar 13 '15 at 19:53
@user3238319 it is also capturing the blank before the `*`. Check here https://regex101.com/r/mG9qG1/3 – Federico Piazza Mar 13 '15 at 19:58
as the solution above, I tried to add another line that's not supposed to be matched, however it's also returned. – user3238319 Mar 13 '15 at 20:43
@user3238319 I'm not sure what you are testing on your end, my answer is related to the question you posted. – Federico Piazza Mar 13 '15 at 20:47

score 0 · Answer 2 · answered Mar 13 '15 at 20:00

0

Assuming the text can contain angle brackets, you can use the entire preamble with non-capturing lookaheads to get the content. The \Z towards the end is needed for the last record.

import re

s = """[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
*something here
*another something here
[1/3/2015 - SSR] another one"""

print 'string to process'
print s
print
print 'matches'
matches = re.findall(
    r'\[\d+/\d+/\d+ - SSR\].*?(?:(?=\[\d+/\d+/\d+ - SSR\])|\Z)', 
    s, re.MULTILINE|re.DOTALL)
for i, match in enumerate(matches, 1):
    print "%d: %s" % (i, match.strip())

The output is

string to process
[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
*something here
*another something here
[1/3/2015 - SSR] another one

matches
1: [1/1/2015 - SSR] something
2: [1/2/2015 - SSR] another:
*something here
*another something here
3: [1/3/2015 - SSR] another one

answered Mar 13 '15 at 20:00

tdelaney

73,364
6
83
116

I tried to add another line that's not supposed to be matched, however it's also returned. – user3238319 Mar 13 '15 at 20:43
@user3238319 - that's another requirement, then. Can you show me the line that shouldn't match? If you have a complicated set of rules, regex may not be the best choice. For instance, a python script is too complicated for regex, although regex is used tokenize. – tdelaney Mar 13 '15 at 20:47
@user3238319 -oh wiat, I see. Its that "not relevant" line. Do the matching lines really start with " *"? What's unique between the matching an not matching parts? – tdelaney Mar 13 '15 at 20:49
the not matching lines won't be started as [xxx] or * – user3238319 Mar 13 '15 at 20:51

Fully match multi lines in Python using Regex

2 Answers2