I'm trying to get all the link innerHTML's using the following
import re
s = '<div><a href="page1.html" title="page1">Go to 1</a>, <a href="page2.html" title="page2">Go to page 2</a><a href="page3.html" title="page3">Go to page 3</a>, <a href="page4.html" title="page4">Go to page 4</a></div>'
match = re.findall(r'<a.*>(.*)</a>', s)
for string in match:
print(string)
But I'm only getting the last occurrence, "Go to page 4" I think it's seeing one big string and several matching regex's within, which are treated as over-lapping and ignored. So, how do I get a collection that matches
['Go to page 1', 'Go to page 2', 'Go to page 3', 'Go to page 4']