I have a link that contents a table. First thing I tried was finding if there is any button to click and unfortunately there isn't. Then I tried to use a package called XML
in R to fetch the data between different nodes to build up a data frame by myself.
In order to do this I need to know which node (or HTML tag) I would like to extracting. So I right click on the web browser and find the tag that contains the table I want.
From <fieldset id="result"
starts the content of the table. We can also see from the browser the first row of the table is <li class="vesselResultEntry removeBackground">
.
Then when I was trying to use R to download this HTML, I found the whole <li>
tags that relating to the table are gone and replaced by <li class="toRemove"/>
. Here is my R code below by the way:
library(XML)
url <- "http://www.fao.org/figis/vrmf/finder/search/#stats"
webpage <- readLines(url)
htmlpage <- htmlParse(webpage, asText = TRUE)
data <- xpathSApply(htmlpage, "//ul[@id='searchResultsContainer']")
data
# <ul id="searchResultsContainer" class="clean resultsContainer"><li class="toRemove"></li></ul>
What I'm trying to do in the code is simply to see if I can fetch the content in a specific tag. Clearly the row I want to fetch is not in the object (webpage
)I saved.
So my questions are:
Is there a way to download the table I want by any means (Ideally in R)?
Is there some kind of protection in this website that prevents me from downloading the whole HTML as a text file and fetch data?
Much appreciate for any suggestions