I have a portion of HTML code in R like the one below:
"</a> <img src=\"images/arrow_orange.gif\" width=\"8\" height=\"12\"> <a href=\"group.php?g=1\">XXXX</a> <img src=\"images/arrow_orange.gif\" width=\"8\" height=\"12\"> <a href=\"category.php?c=100050\">YYYY</a> <img src=\"images/arrow_orange.gif\" width=\"8\" height=\"12\"> <a href=\"category.php?c=100050&brand=Motorola\">ZZZZ</a> <img src=\"images/arrow_orange.gif\" width=\"8\" height=\"12\">AAAA"
I want to use gsub to remove the unwanted HTML code so that the output will be:
XXXX YYYY ZZZZ AAAA
I tried <([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
as shown here but fail, why?
How can I do it in R? Thanks.