Although & encoding is supposed to be the standard way. If you really need to avoid conversion for some reasons, then you can do:
Step 1. Find an unique string which shouldn't exist in your html source. You can simply use ANDamp; as your reserved_amp variable if you confident "ANDamp;" string will not appear in your html source. Otherwise you might consider to generate random alphabetic and check to ensure this string didn't exist in your html source:
>>> import random
>>> import string
>>> length = 15 #increase the length if it's still seems to be collide
>>> reserved_amp = "&"
>>> html = """<a href="https://www.example.com/?param1=value1¶m2=value2">link</a>"""
>>> while reserved_amp in [html, "&"]:
... reserved_amp = ''.join(random.choice(string.ascii_lowercase + string.digits) for _ in range(length)) + "amp;" #amp; is for you easy to spot on
...
>>> print reserved_amp
2eya6oywxg5z7q5amp;
Step 2. replace all occurance of & before parse:
>>> html = html.replace("&", reserved_amp)
>>> html
'<a href="https://www.example.com/?param1=value12eya6oywxg5z7q5amp;param2=value2">link</a>'
>>>
Step 3. replace it back only if you need the original form:
>>> from lxml import etree
>>> parser = etree.HTMLParser()
>>> tree = etree.fromstring(html, parser)
>>> etree.tostring(tree).replace(reserved_amp, "&")
'<html><body><a href="https://www.example.com/?param1=value1¶m2=value2">link</a></body></html>'
>>>
[UPDATE]:
The colon put at the end of reserved_amp
is a safe guard.
What if we generated a reserved_amp
like that ?
ampXampXampXampX + amp;
And html contains:
yyYampX&
It will encoded at this form:
yyYampXampXampXampXampXamp;
Still, it's not possible to return/decoded wrong reversed result something like yy&YampX
(original is yyYampX&
) due to the colon safe guard at the last character is a non-ASCII alphabetical which will never get generated as reserved_amp
from string.ascii_lowercase + string.digits
above.
So, ensure the random not using colon(or other non-ASCII character) and then append it at the end(MUST be the last character) will no need to worry about yyYampX&
revert back to yy&YampX
pitfall.