1

I have a string variable that contains mixed human language and HTML. I would like to delete all all the HTML portion embed in the "<" and ">". I tried the following:

gsub("\\<[^\\<]*\\>", "", subject, perl=TRUE);

But I was told that \< is not valid escape. Can anyone help me with the problem? Many thanks!

xinyuanliu
  • 395
  • 1
  • 3
  • 5
  • "Done with payin good ol Sallie Mae for my learnin at the institushin." This is how one observation looks like. How can I get it to just "Done with payin good ol Sallie Mae for my learnin at the institushin."? – xinyuanliu Aug 29 '17 at 17:34
  • What is this comment? – M-- Aug 29 '17 at 17:45
  • 1
    Edit your question to include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with input and desired output. – MrFlick Aug 29 '17 at 17:47

1 Answers1

2

gsub can replace <tags> and output content between the <tags>.

> str
[1] "<font size=6>Done with payin good ol Sallie Mae for my learnin at the institushin.</font>"

> gsub("<.*?>","", str)
[1] "Done with payin good ol Sallie Mae for my learnin at the institushin."
Sagar
  • 2,778
  • 1
  • 8
  • 16