1

I am new in R and currently trying to scrape some data from the web. The problem is that I want the code to be run every five minutes and after each run to store the data in the dataframe. All the data scraped should be stored in the same dataframe.

Example: There is a production data on the website and I want to scrape them to R:

A1      A2
100     200

These data are updated every 5 minutes. What I want is that every time it is updated (or the code run), new data are appended to the same dataframe.

Result I want:
A1      A2     Time
100     200    28/02/2020 15:45:45
A1      A2     Time
103     199    28/02/2020 15:50:45
A1      A2     Time
90      194    28/02/2020 15:55:45 
……….

At the moment I only got the code that overwrites the results every time the code is run. The code I have right now looks like this:

library(rvest)
library(xml2)
library(plyr)

url <- "myurl"
content <- read_html(url)
dfNEW = data.frame()
Result <- content %>%
  html_node("#gauge")                         %>% 
  html_attrs()                                %>%
  `[`(c("dataA1", "dataA2"))
df <- as.data.frame(t(Result))
rownames(df) <- c()
df$Time <- Sys.time()

total <- rbind.fill(dfNEW, df)

Do you have any idea on how can I make the loop doing what I want?

Thanks in advance!

JaneJane
  • 29
  • 2
  • You might want to look at this: https://stackoverflow.com/questions/1174799/how-to-make-execution-pause-sleep-wait-for-x-seconds-in-r . At the end of your loop, tell are to wait for 5 minutes. – SebSta Feb 28 '20 at 08:03
  • Thanks! But my question is more about how to append new data to the dataframe every time I run the code. – JaneJane Feb 28 '20 at 08:05
  • The last line should be updating the final data frame every 5 minutes, not creating a new one. So something like: `total <- rbind.fill(total, dfNEW)`. The `total` data.frame needs to be initialized first, before the looping procedure. – Edward Feb 28 '20 at 08:30

1 Answers1

0

a loop might look like this:

dfNEW <- data.frame()

for(i in seq(100){

  # code to generate new df
  df <- as.data.frame(t(Result))
  rownames(df) <- c()
  df$Time <- Sys.time()
  dfNEW <- rbind.fill(dfNEW, df)

  Sys.sleep(5*60)
  }

So you have to rbind the new line of the data.frame in each loop to the already existing one.

SebSta
  • 476
  • 2
  • 12