I have been trying to use regex to parse through a XML-style string that looks like this:
Input
"Joe Doe got a <span class="procedure">X ray</span> <- in April blah blah <span <- class="disease">lacerations</span> blah <span <- class="anatomy">kidney</span>."
For each span I want to match three groups: "<span class="blah">blah</span>" , class, textual content
For Example:
<span class="procedure">X ray</span>
the matches are: <span class="procedure">X ray</span>, procedure, X ray
Till now I have been able to use re.search('<.+?>', xml)
to find <span class="procedure">
Inspite of using re.search('<.+?>+, xml)
, I have no luck in finding the other strings instead it gave <span class="procedure">X ray</span> <- in April>
which isn't the desired result either.