I'm trying to replace all instances of href="../directory"
with href="../directory/index.html"
.
In Python, this
reg = re.compile(r'<a href="../(.*?)">')
for match in re.findall(reg, input_html):
output_html = input_html.replace(match, match+'index.html')
produces the following output:
href="../personal-autonomy/index.htmlindex.htmlindex.htmlindex.html"
href="../paternalism/index.html"
href="../principle-beneficence/index.htmlindex.htmlindex.html"
href="../decision-capacity/index.htmlindex.htmlindex.html"
Any idea why it works with the second link, but the others don't?
Relevant part of the source:
<p>
<a href="../personal-autonomy/">autonomy: personal</a> |
<a href="../principle-beneficence/">beneficence, principle of</a> |
<a href="../decision-capacity/">decision-making capacity</a> |
<a href="../legal-obligation/">legal obligation and authority</a> |
<a href="../paternalism/">paternalism</a> |
<a href="../identity-personal/">personal identity</a> |
<a href="../identity-ethics/">personal identity: and ethics</a> |
<a href="../respect/">respect</a> |
<a href="../well-being/">well-being</a>
</p>
EDIT: The repeated 'index.html' is actually the result of multiple matches. (e.g. href="../personal-autonomy/index.htmlindex.htmlindex.htmlindex.html" is because ../personal-autonomy is found four times in the original source).
As a general regex question, how would you replace all instances without adding an additional 'index.html' to all matches?