I want to remove all subscripts from a piece of html code, except the subscript “rep”.
For instance, the string "t<sub>i</sub>(10) = 23, p<sub>rep</sub>=.2"
should become: "t(10) = 23, p<sub>rep</sub>=.2"
I was trying things like:
txt <- "t<sub>i</sub>(10) = 23, p<sub>rep</sub>=.2"
gsub(pattern="<sub>(?!rep).*</sub>",replacement="",txt,perl=TRUE)
But the problem is that this line of code deletes everything between the first <sub>
and the last </sub>
in the html file...