I have a problem similar to Scraping a web page, links on a page, and forming a table with R . I would have posted this as a comment to that topic but I am not scoring enough, yet.
I have the following code:
## Import web page
FAO_Countries <- read_html("http://www.fao.org/countryprofiles/en/")
## Import the urls I am interested in with 'selectorgadget'
FAO_Countries_urls <- FAO_Countries %>%
html_nodes(".linkcountry") %>%
html_attr("href")
## Import the links I am interested in with 'slectorgadget'
FAO_Countries_links <- FAO_Countries %>%
html_nodes(".linkcountry") %>%
html_text()
## I create a dataframe with two previous objects
FAO_Countries_data <- data.frame(FAO_Countries_links = FAO_Countries_links,
FAO_Countries_urls = FAO_Countries_urls, stringsAsFactors = FALSE)
At this point, I would like to pick up the text from the urls I have got and adding as a column in the right and do this for other things I need. Nevertheless, when I compile
FAO_Countries_data_text <- FAO_Countries_data$FAO_Countries_urls %>%
html_nodes("#foodSecurity-1") %>%
html_text()
I get the following error message:
Error in UseMethod("xml_find_all") :
no applicable method for 'xml_find_all' applied to an object of class "character"
In other words, I cannot grab links from the new-made dataframe.
Now, I have a dataframe that appears as follows:
> head(FAO_Countries_data, n=3)
FAO_Countries_links FAO_Countries_urls
1 Afghanistan /countryprofiles/index/en/?iso3=AFG
2 Albania /countryprofiles/index/en/?iso3=ALB
3 Algeria /countryprofiles/index/en/?iso3=DZA
I would to expand this data frame by adding columns including info that are present in the various urls. e.g. :
FAO_Countries_links FAO_Countries_urls Food_security
1 Afghanistan /countryprofiles/index/en/?iso3=AFG Family farming