Is there a different method to increase run performance in R?

Question

I'm collecting some Economic indicator data. In this process, I also want to collect hourly tweet counts with the script. I asked a similar question with simple data before. As the historical data grows, the run times will get longer. Since the result table will be a dataframe, can I run this script more effectively with functions such as apply family or do.call?

library(httr)
library(dplyr)
library(lubridate)
library(tidyverse)
library(stringr)

sel1<-c('"#fed"','"#usd"','"#ecb"','"#eur"')

for (i in sel1) 
{
  for (ii in 1:20){
    headers = c(
      `Authorization` = 'Bearer #enter your Bearer token#'
    )
    
    params = list(
      `query` =i,
       #my sys.time is different
      `start_time` = strftime(Sys.time()-(ii+1)*60*60, "%Y-%m-%dT%H:%M:%SZ",tz ='GMT'),
      `end_time` =strftime(Sys.time()-ii*60*60, "%Y-%m-%dT%H:%M:%SZ",tz ='GMT'),
      `granularity` = 'hour'
    )
    
    res1<- httr::GET(url = 'https://api.twitter.com/2/tweets/counts/recent', httr::add_headers(.headers=headers), query = params) %>% 
      content( as = 'parsed')
    x1<-cbind(data.frame(res1),topic=str_replace_all(i, "([\n\"#])", ""))
    
    if(!exists("appnd1")){
      appnd1 <- x1
    } else{
      appnd1 <- rbind(appnd1, x1)
    }
  }
}

r2evans · Answer 1 · 2021-12-17T19:14:41.390

In general, iteratively rbind-ing data in a for loop will always get worse with time: each time you do one rbind, it copies all of the previous frame into memory, so you have two copies of everything. With small numbers this is not so bad, but you can imagine that copying a lot of data around in memory can be a problem. (This is covered in the R Inferno, chapter 2, Growing objects. It's good reading, even if it is not a recent document.)

The best approach is to create a list of frames (see https://stackoverflow.com/a/24376207/3358227), add contents to it, and then when you are done combine all frames within the list into a single frame.

Untested, but try this modified process:

library(httr)
library(dplyr)
library(lubridate)
library(tidyverse)
library(stringr)

sel1<-c('"#fed"','"#usd"','"#ecb"','"#eur"')

listofframes <- list()

for (i in sel1) {
  for (ii in 1:20){
    headers = c(
      `Authorization` = 'Bearer #enter your Bearer token#'
    )
    
    params = list(
      `query` =i,
       #my sys.time is different
      `start_time` = strftime(Sys.time()-(ii+1)*60*60, "%Y-%m-%dT%H:%M:%SZ",tz ='GMT'),
      `end_time` =strftime(Sys.time()-ii*60*60, "%Y-%m-%dT%H:%M:%SZ",tz ='GMT'),
      `granularity` = 'hour'
    )
    
    res1<- httr::GET(url = 'https://api.twitter.com/2/tweets/counts/recent', httr::add_headers(.headers=headers), query = params) %>% 
      content( as = 'parsed')
    x1<-cbind(data.frame(res1),topic=str_replace_all(i, "([\n\"#])", ""))
    listofframes <- c(listofframes, list(x1))
  }
}

# choose one of the following based on your R-dialect/package preference
appnd1 <- do.call(rbind, listofframes)
appnd1 <- dplyr::bind_rows(listofframes)
appnd1 <- data.table::rbindlist(listofframes)

Is there a different method to increase run performance in R?

1 Answers1