1

How would I match the code below to get two strings:

  1. title to the third closing a tag
  2. 2nd title to the 6th closing a tag.(and so on...3rd title to the 9th closing a tag...etc)

Here is the string to be matched:

title
<a></a>
content here
<a></a>
text...
<a></a>
text...
title 
<a></a>
<a></a>
<a></a>

I tried using .* but this captured the text from the title to the last a tag.

Archetype2
  • 97
  • 1
  • 10

2 Answers2

1
from re import findall, DOTALL

text = '''
title
<a></a>
content here
<a></a>
text...
<a></a>
text...
title 
<a></a>
<a></a>
<a></a>
'''
print findall(r'title.*?</a>.*?</a>.*?</a>', text, DOTALL)

gives

['title\n<a></a>\ncontent here\n<a></a>\ntext...\n<a></a>', 'title \n<a></a>\n<a></a>\n<a></a>']

you can also use

print findall(r'title(?:.*?</a>){3}', text, DOTALL)
Akinakes
  • 657
  • 4
  • 10
0

Generally * is greedy, while *? is reluctant. Try replacing .* with .*?.

Hyperboreus
  • 31,997
  • 9
  • 47
  • 87