1

My goal is to find the piece of text between search_term_start and search_term_end. The problem I'm having is that I can only accomplish this if I use a string without '\n' characters. The code below raises an AttributeError.

import re

logs = 'cut-this-out \n\n givemethisstring \n\n and-this-out-too'

search_term_start = '''cut-this-out'''
search_term_end = '''and-this-out-too'''

total_pages = re.search(search_term_start + '(.*)' + search_term_end, logs)
print(total_pages.group(1))

If I remove the '\n' characters from logs, the program runs how I intend it to:

import re

logs = 'cut-this-out givemethisstring and-this-out-too'

search_term_start = '''cut-this-out'''
search_term_end = '''and-this-out-too'''

total_pages = re.search(search_term_start + '(.*)' + search_term_end, logs)
print(total_pages.group(1))

I can't seem to search for substrings in a string if it has '\n' characters. How can I retrieve this substring and save it without removing the '\n's from the original string?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
mimky
  • 146
  • 7

1 Answers1

1

re.DOTALL is exactly the flag you are looking for.

Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline. Corresponds to the inline flag (?s).

Try this:

import re

logs = 'cut-this-out \n\n givemethisstring \n\n and-this-out-too'

search_term_start = '''cut-this-out'''
search_term_end = '''and-this-out-too'''


c = re.compile(search_term_start + r'(.*)' + search_term_end, re.DOTALL)
print(c.search(logs).group(1))
S.B
  • 13,077
  • 10
  • 22
  • 49
  • If there are multiple 'and-this-out-too' statements, it'll print up until the last one, how could I stop iterating through the string once I've found the first instance of search_term_end? – mimky Nov 17 '20 at 09:34
  • 1
    @mimky Use Non-greedy version: `r'(.*?)'` instead of `r'(.*)'` – S.B Nov 17 '20 at 09:40