I'm trying to parse complex HTML structures using Python's re module, and I've run into a roadblock with my regex pattern. Here's what I'm trying to do:
I have HTML text that contains nested elements, and I want to extract the content of the innermost tags. However, I can't seem to get my regex pattern right. Here's the code I'm using:
import re
html_text = """
<div>
<div>
<div>
Innermost Content 1
</div>
</div>
<div>
Innermost Content 2
</div>
</div>
"""
pattern = r'<div>(.*?)<\/div>'
result = re.findall(pattern, html_text, re.DOTALL)
print(result)
I expected this code to return the content of the innermost elements, like this:
['Innermost Content 1', 'Innermost Content 2']
But it's not working as expected. What am I doing wrong with my regex pattern, and how can I fix it to achieve the desired result? Any help would be greatly appreciated!