Python Regex: How Can I find Recurring Blocks of Texts in a Text File

Question

I'm trying to parse repeating blocks of text that all begin with '----BEGIN---' and end with '---END', using Python. So the text file will look like below. Basically, I want to be able to find each block (words, numbers, and special characters) and parse them for further analysis. The code below is as close as I have gotten, but it returns the entire document, not each block. Any help would be appreciated.

block_search = re.compile('----BEGIN---.*---END',re.DOTALL)
with open(file,'r',encoding='utf-8') as f:
    text = f.read()
    result = re.findall(block_search,text)

----BEGIN--- Words Special Character Numbers words Special character words numbers words words. words numbers words Special character words numbers words words words numbers words words ---END

----BEGIN--- Words words numbers words Special character words numbers words words. words numbers words Special character words numbers words words words numbers words words ... ---END

score 0 · Accepted Answer · answered Jul 22 '21 at 20:35

0

'----BEGIN---.*---END' will match anything from the first occurence of ----BEGIN--- to the last occurence of ---END, that is what .* does. If you want to find the specific block, use .*?, it will stop after the first occurrence of substring after it, or in other words, it will search only until it finds the substring after it.

block_search = re.compile('----BEGIN---.*?---END',re.DOTALL)

answered Jul 22 '21 at 20:35

ThePyGuy

17,779
5
18
45

That got me 90% of the way there. What I don't understand now is that with re.findall() it does not find every instance of the block. – Clovis Jul 22 '21 at 20:42
Yeah, you were missing `?` only. For the sample data you have, it is finding both the occurrences. – ThePyGuy Jul 22 '21 at 20:44
No. I understood you there. There was a different problem with my code that prevented it from finding the following iterations of the blocks. Thanks for the help! – Clovis Jul 22 '21 at 20:48

Python Regex: How Can I find Recurring Blocks of Texts in a Text File

1 Answers1