In R XML Xpath, @href is returning the text "href"

Question

I am trying to get the contents of href using Xpath code as described in these two posts. Unfortunately the code is returning the actual text "href" and several spaces in addition to the URL. How can I avoid that?

library(XML)

html <- readLines("http://www.msu.edu")
html.parse <- htmlParse(html)
Node <- getNodeSet(html.parse, "//div[@id='MSU-top-utilities']//a/@href")
Node[[1]]

# > Node[[1]]
#                  href 
# "students/index.html" 
# attr(,"class")
# [1] "XMLAttributeValue"

score 5 · Accepted Answer · answered Oct 03 '15 at 02:38

It's just a named character vector. You can do:

as.character(Node[[1]])

which will give you

## [1] "students/index.html"

Alternately, here's a much better idiom in the xml2 package:

library(xml2)

doc <- read_html("http://www.msu.edu")
nodes <- xml_find_all(doc, "//div[@id='MSU-top-utilities']//a")
xml_attr(nodes, "href")

## [1] "students/index.html"      "faculty-staff/index.html" "alumni/index.html"       
## [4] "businesses/index.html"    "visitors/index.html"

In R XML Xpath, @href is returning the text "href"

1 Answers1