Finding substrings in string with multiple '\n's

Question

My goal is to find the piece of text between search_term_start and search_term_end. The problem I'm having is that I can only accomplish this if I use a string without '\n' characters. The code below raises an AttributeError.

import re

logs = 'cut-this-out \n\n givemethisstring \n\n and-this-out-too'

search_term_start = '''cut-this-out'''
search_term_end = '''and-this-out-too'''

total_pages = re.search(search_term_start + '(.*)' + search_term_end, logs)
print(total_pages.group(1))

If I remove the '\n' characters from logs, the program runs how I intend it to:

import re

logs = 'cut-this-out givemethisstring and-this-out-too'

search_term_start = '''cut-this-out'''
search_term_end = '''and-this-out-too'''

total_pages = re.search(search_term_start + '(.*)' + search_term_end, logs)
print(total_pages.group(1))

I can't seem to search for substrings in a string if it has '\n' characters. How can I retrieve this substring and save it without removing the '\n's from the original string?

score 1 · Accepted Answer · answered Nov 17 '20 at 08:50

1

re.DOTALL is exactly the flag you are looking for.

Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline. Corresponds to the inline flag (?s).

Try this:

import re

logs = 'cut-this-out \n\n givemethisstring \n\n and-this-out-too'

search_term_start = '''cut-this-out'''
search_term_end = '''and-this-out-too'''


c = re.compile(search_term_start + r'(.*)' + search_term_end, re.DOTALL)
print(c.search(logs).group(1))

answered Nov 17 '20 at 08:50

S.B

13,077
10
22
49

If there are multiple 'and-this-out-too' statements, it'll print up until the last one, how could I stop iterating through the string once I've found the first instance of search_term_end? – mimky Nov 17 '20 at 09:34
1

@mimky Use Non-greedy version: `r'(.*?)'` instead of `r'(.*)'` – S.B Nov 17 '20 at 09:40

Finding substrings in string with multiple '\n's

1 Answers1