0

I have a file that has a repeated pattern output

!-----------------------------------------------------------------
line 1
line 2
line 3
.....
-------------------------------------------------------------------!

I am trying to match and extract all the occurrences of these blocks but the below returns all the file

match = re.search(r'\!-.*-\!', data, re.DOTALL)
print match.group()
A D
  • 79
  • 1
  • 10
  • Try putting a question mark after the `*`: `r'\!-.*?-\!'`. Also, I don't know why there would be a need to escape exclamation marks, as far as I know they have no special function in regular expressions. – L3viathan Aug 29 '17 at 14:35
  • @L3viathan That didn't work either...I still get the whole file both on group() and group(0). – A D Aug 29 '17 at 14:42
  • Make sure to use the `re.DOTALL` flag, as shown in Honza Zíka's answer. [Demo of the working pattern](https://regex101.com/r/zlGpBt/1) – L3viathan Aug 29 '17 at 14:44
  • @L3viathan Please see my code in the question. I do use DOTALL – A D Aug 29 '17 at 14:45
  • a) Use `!-.*?-!` instead of your pattern. b) Use `re.findall` (gives you a list) or `re.finditer` (gives you an iterator; maybe better for huge files) instead of `re.search`. – L3viathan Aug 29 '17 at 14:47
  • 1
    @L3viathan Thanks for the comments...Finally that made it work `match = re.findall(r'!-.*?-!', data, re.DOTALL)` – A D Aug 29 '17 at 14:56

1 Answers1

0

Regexes in Python are greedy by default, meaning * will consume as many characters as possible. You can turn off greediness by using *?:

match = re.search(r'\!-.*?-\!', data, re.DOTALL)
Honza Zíka
  • 495
  • 3
  • 12