Consider this example, which I've ran on Python 2.7:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
tstr = r''' <div class="thebibliography">
<p class="bibitem" ><span class="biblabel">
[1]<span class="bibsp"> </span></span><a
id="Xtester"></a><span
class="cmcsc-10">A<span
class="small-caps">k</span><span
class="small-caps">e</span><span
class="small-caps">g</span><span
class="small-caps">c</span><span
class="small-caps">t</span><span
class="small-caps">o</span><span
class="small-caps">r</span>,</span>
<span
class="cmcsc-10">P. D.</span><span
class="cmcsc-10"> H. </span> testöng ... . <span
class="cmti-10">Draftin:</span>
<a
href="http://www.example.com/test.html" class="url" ><span
class="cmitt-10">http://www.example.com/test.html</span></a> (2001).
</p>
</div>
'''
# remove <a id>
tout2 = re.sub(r'''<a[\s]*?id=['"].*?['"][\s]*?></a>''', " ", tstr, re.DOTALL)
# remove class= in <a
regstr = r'''(<a.*?)(class=['"].*?['"])([\s]*>)'''
print( re.findall(regstr, tout2, re.DOTALL)) # finds
print("------") #
print( re.sub(regstr, "AAAAAAA", tout2, re.DOTALL )) # does nothing?
When I run this - the first regex is replaced/sub'd as expected ( is gone); then in the output I get:
[('<a\nhref="http://www.example.com/test.html" ', 'class="url"', ' >')]
... which means that the second regex is written correctly (all three parts are found) - but then, when I try to replace all of that snippet with "AAAAAAA" - nothing happens in that part of output:
------
<div class="thebibliography">
<p class="bibitem" ><span class="biblabel">
[1]<span class="bibsp"> </span></span> <span
class="cmcsc-10">A<span
class="small-caps">k</span><span
class="small-caps">e</span><span
class="small-caps">g</span><span
class="small-caps">c</span><span
class="small-caps">t</span><span
class="small-caps">o</span><span
class="small-caps">r</span>,</span>
<span
class="cmcsc-10">P. D.</span><span
class="cmcsc-10"> H. </span> testöng ... . <span
class="cmti-10">Draftin:</span>
<a
href="http://www.example.com/test.html" class="url" ><span
class="cmitt-10">http://www.example.com/test.html</span></a> (2001).
</p>
</div>
Clearly, there is no "AAAAAAA" here, as I'd expect.
What is the problem, and what should I do, to get sub
to replace the matches that apparently have been found?