I have to replace &
with its name entity or decimal entity from input string, but input string may contains other name and decimal entities in with &
will present.
Code:
import re
text =' At&T, " < I am > , At&T so < < & & '
#- Get all name entities and decimal entities.
replace_tmp = re.findall("&#\d+;|&[a-z]+;", text)
#- Replace above values from tempvalues.
tmp_dict = {}
count = 1
for i in replace_tmp:
text = text.replace(i, "$%d$"%count)
tmp_dict["$%d$"%count] = i
count += 1
#- Replace & with &
text = text.replace("&", "&")
#- Replace tempvalues values with original.
for i in tmp_dict:
text = text.replace(i, tmp_dict[i])
print text
Final Output: At&T, " < I am > , At&T so < < & &
But Can I get regular expression which directly does above thing?
Final line in py file:
value = re.sub(r'&(?!(#[0-9]+;|[a-zA-Z]+;))', '&', value).replace("<", "<").replace(">", ">").replace('"', """)