I have been trying to match tag names only (without the <
and >
signs) is cases of regular tags:
<w:tag w:attrib1="http://url" w:attrib2="anyValue">
without matching solo tags (opening-closing tags):
<w:tag2 w:attrib1="anyValue" w:attrib2="http://url" />
(please pay attention to the URLs in the attributes as they contain forward slashes (/
))
but could not manage to get to it with:
regex = re.compile('(?<=<)w:\w+(?=[\w\W]+>)(?!\s/>)')
print(regex.findall(string))
getting this:
['w:tag','w:tag2']
expecting this:
['w:tag']
any thoughts?
Cheers.