i am trying to pic the only url /~/ to .ashx wich is within the quots. from the complete html source file wich i have scraped , i tried the below function to get href match list .
processHTML <- function(html) {
doc <- htmlTreeParse(html, useInternalNodes=TRUE)
text <- xpathSApply(doc, "//a/@href")
}
from the below code snippet i need to pic only excluding the href and qoutations , /~/media/McKinsey/Business Functions/Marketing and Sales/Our Insights/Discussions in digital Whats a marketing ecosystem/Discussions-in-digital-Marketings-ecosystem.ashx
:
href "/~/media/McKinsey/Business Functions/Marketing and Sales/Our Insights/Discussions in digital Whats a marketing ecosystem/Discussions-in-digital-Marketings-ecosystem.ashx"
please help me out with regular expression for above problem