5

I have a simple piece of R code which reads html data from a website then I am trying to loop through the pages and get data from each page. I have used this piece of code numerous times and it works. It adds to a R variable the results from each page but for some reason on this site it wont work. Any ideas?

library(XML)
library(RCurl)


data <- NULL

getData <- function(url) {
#For some reason cant read directly from site, need to use RCurl to get the data first
xData <- getURL(url)
table <- data.frame(readHTMLTable(xData)$'NULL')
data <- table
}

getData(url="https://steemdb.com/accounts/reputation?page=1")
Kharoof
  • 597
  • 1
  • 6
  • 21
  • x <- getData(url="https://steemdb.com/accounts/reputation?page=1") x contains the data. – Indi Nov 17 '16 at 09:18
  • How about adding `return(data)` to your function? I wouldn't advise mixing global environment and function environment. – statespace Nov 17 '16 at 09:23

1 Answers1

10

I think I know what is wrong

Change data <- table to data <<- table within your function

You are assigning the result to the local environment for the function, whilst the <<- will be assigning it to the global environment.

I would propose you try the following

library(rvest)
getData <- function(url) { html_table(read_html(url)) }

data <- getData("https://steemdb.com/accounts/reputation?page=1")

Or even better

library(rvest)
getData <- function(url) { html_table(read_html(url)) }
steemdb.url <-"https://steemdb.com/accounts/reputation?page=" 

data <- lapply(1:100, function(i) getData(paste0(steemdb.url, i)) )
data <- do.call(rbind, data)
View(data)

1:100 will get you the first 100 pages.
dimitris_ps
  • 5,849
  • 3
  • 29
  • 55
  • Thanks dimitris_ps, just to edit your code do.call(rbind, data) needs to be done twice for this to work. Thanks for the help. – Kharoof Nov 18 '16 at 07:18