I want to escape the unescaped data inside a xml string e.g.
string = "<tag parameter = "something">I want to escape these >, < and &</tag>"
to
"<tag parameter = "something">I want to escape these >, < and &</tag>"
- Now, I definitely can't use any xml parsing libraries like xml.dom.minidom or xml.etree because the data is unescaped & will give error
In regex, I figure out way to match & get start and end positions of data substing
exp = re.search(">.+?</", label) # Get position of the data between tags start = exp.start() + 1 end = exp.end() - 2 return label[ : start] + saxutils.escape(label[start : end]) + label[end : ]
But in re.search, I can't match the exact xml format
- If I use re.findall I can't get positions of the substrings found
- I could always find positions of found substring by index but that won't be efficient, I want a simple but efficent solution
- BeautifulSoup solutions are welcomed but I wish there was some more beautiful way to do it with python's basic libraries