One for the regex enthusiasts. I have a vector of strings in the format:
<TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Verdana" STYLE="font-size: 10px" size="10" COLOR="#FF0000" LETTERSPACING="0" KERNING="0">Desired output string containing any symbols</FONT></P></TEXTFORMAT>
I'm aware of the perils of parsing this sort of stuff with regex. It would however be useful to know how to efficiently extract an output sub-string of a larger string match - i.e. the contents of angle quotes >...<
of the font tag. The best I can do is:
require(stringr)
strng = str_extract(strng, "<FONT.*FONT>") # select font statement
strng = str_extract(strng, ">.*<") # select inside tags
strng = str_extract(strng, "[^/</>]+") # remove angle quote symbols
What would be the simplest formula to achieve this in R?