0

I have several existing data.frame objects that need to be updated from the Internet. However, as the updates have the same names as the mentioned existing objects, I put the updates in a separate environment also as data.frame objects.

Then, the idea is to append the updates to the existing data.frame objects. But I don't see how I can do that iteratively (i.e., in a loop?) with rbind from one environment to GlobalEnv (or another environment, for that matter).

Also, I did not put them here, but there will be several other data.frame objects (with other names) that will in the GlobalEnv (or the environment where they will be loaded).

Here below is a piece of code that should be reproducible (with comments and links to the sources):

library(quantmod)

# Load ticker data from 2020-01-01 till 2021-02-02
tickers <- c("NKLA", "MPNGF", "RMO", "JD", "MSFT")
getSymbols.yahoo(tickers, auto.assign = TRUE, env = globalenv(), 
                 from = "2020-01-01", to = "2021-02-02")

# Close all Internet connections as a precaution
# https://stackoverflow.com/a/52758758/2950721
closeAllConnections()

# Find xts objects
xtsObjects <- names(which(unlist(eapply(.GlobalEnv, is.xts))))

# Convert xts to data.frame
# https://stackoverflow.com/a/69246047/2950721
for (i in seq_along(xtsObjects)) {
  assign(xtsObjects[i], fortify.zoo(get(xtsObjects[i])))
}


# Redo the previous process but in separate environment for updated
# values of the same tickers (comments and sources are not repeated)
symbolUpdates.env <- new.env()

getSymbols.yahoo(tickers, auto.assign = TRUE, env = symbolUpdates.env,
                 from = "2021-02-03")

closeAllConnections()

symbolUpdatesXtsObjects <- names(which(unlist(eapply(symbolUpdates.env, 
                                                     is.xts))))

for (i in seq_along(symbolUpdatesXtsObjects)) {
  assign(envir = symbolUpdates.env, symbolUpdatesXtsObjects[i], 
         fortify.zoo(get(symbolUpdatesXtsObjects[i], 
                         envir = symbolUpdates.env)))
}

# Find ```data.frame``` objects both in ```GlobalEnv``` and 
# ```symbolUpdates.env```
globalEnvDataframeObjects <- names(which(unlist(eapply(.GlobalEnv, 
                                                        is.data.frame))))
symbolUpdatesDataframeObjects <- names(which(unlist(eapply(symbolUpdates.env, 
                                                           is.data.frame))))


# This rbind definitely does not work!!!
for (i in seq_along(globalEnvDataframeObjects)) {
  rbind(envir = .GlobalEnv, globalEnvDataframeObjects[i], envir =
  symbolUpdates.env, symbolUpdatesDataframeObjects[i])
}

My questions:

  • With preferably no additional packages than the basic R ones, what piece of code can iteratively append symbolUpdatesDataframeObjects to the corresponding globalEnvDataframeObjects?
  • Would the code be the same should globalEnvDataframeObjects be in another environment (i.e., not .GlobalEnv, but a "sub-environment" like symbolUpdates.env)?
    • If not, what would change?
  • Is there a better/wiser approach than the one I'm trying to use?

Thanks in advance.


Systems used:

  • R version: 4.1.1 (2021-08-10)
  • RStudio version: 1.4.1717
  • OS: macOS Catalina version 10.15.7 and macOS Big Sur version 11.6
pdeli
  • 436
  • 3
  • 13
  • where do you want to update the objects rbinded – akrun Oct 07 '21 at 21:51
  • The `symbolUpdatesDataObjects` columns have duplicated Index column. So, it is giving 8 columns while the interObj is giving 7 columns for each data. Therefore we need to remove the extra column. I used `[-1]` – akrun Oct 07 '21 at 21:58
  • Also, I find the column names in `symbolUpdatesDateaObjects` are kind of assigned differently. You can check `sapply(mget(symbolUpdatesDataframeObjects, envir = symbolUpdates.env), names)` – akrun Oct 07 '21 at 22:00
  • Can you please correct those errors and the code below should work then – akrun Oct 07 '21 at 22:01
  • Thanks for your comments akrun. Could you kindly tell me where to put the ```[-1]```? – pdeli Oct 08 '21 at 10:43
  • Any ideas as to why in ```symbolUpdates.env``` an additional ```Index``` column appears? I rechecked the whole code, made several iterations and all of them put an additional ```Index``` column if it is in another environment than ```.GlobalEnv```. – pdeli Oct 08 '21 at 10:46
  • 1
    Ok, I got the answer: the ```fortify.zoo(get(symbolUpdatesXtsObjects[i])``` at the end of the code should have read ```fortify.zoo(get(symbolUpdatesXtsObjects[i], envir = symbolUpdates.env)```. The code above now works for me meaning no additional ```Index``` column. :-) – pdeli Oct 08 '21 at 17:18
  • @pdeli your code is not R like, usng assign in the manner you have constantly is considered bad R (please see here) https://stackoverflow.com/questions/17559390/why-is-using-assign-bad. Please see my solution below. – hello_friend Oct 08 '21 at 23:34
  • @hello_friend, thanks for your comment. And you are probably right. However, the piece of code I need will have to be integrated into a much bigger one and in order to use your solutions it would need a complete re-write. Which I definitely will have to do sometimes in the future. In the meantime, kindly check out my comment to your post. – pdeli Oct 09 '21 at 14:29

2 Answers2

1

We may need intersect here

interObj <- intersect(globalEnvDataframeObjects, symbolUpdatesDataframeObjects)
interObj <- interObj[match(interObj, symbolUpdatesDataframeObjects)]
nrow(get(interObj[1]))
[1] 273
for (i in seq_along(interObj)) {
  assign(interObj[i], rbind(get(interObj[i], envir = .GlobalEnv), 
    get(symbolUpdatesDataframeObjects[i], envir = symbolUpdates.env)), envir = .GlobalEnv)
}
akrun
  • 874,273
  • 37
  • 540
  • 662
  • akrun, would you mind pointing to the repository where ```intersect``` is, because I keep on getting warnings that ```package ‘Intersect’ is not available for this version of R```. I of course enabled all the repositories with ```setRepositories()``` and checked with ```available.packages()```. – pdeli Oct 08 '21 at 17:42
  • @pdeli `intersect` is a `base R` function. It is all lowercaps – akrun Oct 08 '21 at 18:16
  • Oups. My bad. Sorry akrun. Your code works until ```nrow(get(interObj[1]))```. However when I run the ```for``` loop, I get the following message: ```Error in match.names(clabs, names(xi)) : names do not match previous names```. Could it be because ```interObj``` after the ```match``` shows a different order in the objects (i.e, before: ```[1] "NKLA" "MPNGF" "MSFT" "JD" "RMO"```, after: ```[1] "MPNGF" "RMO" "JD" "MSFT" "NKLA"```)? – pdeli Oct 08 '21 at 18:50
  • 1
    I found the solution. I used the ```sort``` command so that both ```symbolUpdatesDataframeObjects``` and ```interObj``` have the same order. And now it seems to work every time. – pdeli Oct 11 '21 at 18:10
  • @pdeli The `match` should also work. Maybe it was because there was some elements that more in interObj and that results in `NA` – akrun Oct 11 '21 at 18:14
0

If it is required to store the data.frames in multiple environments use the following:

# Install pacakges if they are not already installed: necessary_packages => vector
necessary_packages <- c("quantmod")

# Create a vector containing the names of any packages needing installation:
# new_pacakges => vector
new_packages <- necessary_packages[!(necessary_packages %in%
                                       installed.packages()[, "Package"])]

# If the vector has more than 0 values, install the new pacakges
# (and their) associated dependencies:
if(length(new_packages) > 0){
  install.packages(
    new_packages, 
    dependencies = TRUE
  )
}

# Initialise the packages in the session: list of boolean => stdout (console)
lapply(
  necessary_packages, 
  require, 
  character.only = TRUE
)

# Load ticker data from 2020-01-01 till 2021-02-02
tickers <- c(
  "NKLA", 
  "MPNGF", 
  "RMO", 
  "JD", 
  "MSFT"
)

# Create a new environment: environment => symbolUpdates.env
symbolUpdates.env <- new.env()

# Create a vector of from dates: from_dates => Date Vector
from_dates <- as.Date(
  c(
    "2020-01-01", 
    "2020-02-03"
  )
)

# Create a vector of to dates:
to_dates <- as.Date(
  c(
    "2021-02-02", 
    format(
      Sys.Date(),
      "%Y-%m-%d"
    )
  )
)

# Create a vetor environments: env_vec => vector of environments
env_vec <- c(
  .GlobalEnv, 
  symbolUpdates.env
)

# Function to retreive ticker as a data.frame: 
# retrieve_ticker_df => function()
retrieve_ticker_df <- function(ticker_vec, from_date, to_date){
  
  # Create a list of size length(tickers):
  # df_list => empty list
  df_list <- vector(
    "list", 
    length(ticker_vec)
  )
  
  # Store each ticker's response as a data.frame in the list:
  # df_list => list of data.frames
  df_list <- setNames(
    lapply(
      seq_along(ticker_vec),
      function(i){
        # Retrieve the data.frame: tmp => data.frame
        tmp <- getSymbols.yahoo(
          ticker_vec[i],
          auto.assign = FALSE, 
          from = from_date,
          to = to_date,
          return.class = 'data.frame',
        )
        
        # Close all Internet connections as a precaution
        # https://stackoverflow.com/a/52758758/2950721
        closeAllConnections()
        
        # Create a data.frame and revert index to sequential
        # integers: data.frame => env
        data.frame(
          cbind(
            date = as.Date(
              row.names(
                tmp
              )
            ),
            tmp
          ),
          row.names = NULL
        )
      }
    ),
    ticker_vec
  )
  # Explicitly define returned object: list of data.frames => env
  return(df_list)
}

# Store all the data.frames in a list of data.frames, 
# store each list of data.frames in a list: 
# ticker_df_list_list => list of list of data.frames
ticker_df_list_list <- lapply(
  seq_along(env_vec),
  function(i){
    retrieve_ticker_df(
      tickers, 
      from_dates[i], 
      to_dates[i]
    )
  }
)

# Push each of the lists to the appropriate environment: 
# data.frames => env
lapply(
  seq_along(ticker_df_list_list),
  function(i){
    list2env(
      ticker_df_list_list[[i]],
      envir = env_vec[[i]]
    )
  }
)

# Initialise an empty list to create some memory
# bound_df_list => empty list
bound_df_list <- vector(
  "list", 
  length(tickers)
)

# Allocate some memory by initialising an
# empty list: ir_list => list
ir_list <- vector(
  "list",
  length(env_vec) * length(tickers)
)

# Unlist the env_vec, and retrieve the ticker
# data.frames: ir_list => list of data.frames
ir_list <- unlist(
  lapply(
    env_vec,
    function(x){
      mget(
        tickers, 
        envir = x
      )
    }
  ),
  recursive = FALSE
)

# Split-apply-combine based on the 
# data.frame names: bound_df_list => list of data.frames
bound_df_list <- lapply(
  split(
    ir_list,
    names(ir_list)
  ),
  function(x){
    do.call(
      rbind, 
      x
    )
  }
)

# Clear up the intermediate objects:
rm(ticker_df_list_list, ir_list, env_vec); gc()

If it is not mandatory to use multiple environments:

# Install pacakges if they are not already installed: necessary_packages => vector
necessary_packages <- c("quantmod")

# Create a vector containing the names of any packages needing installation:
# new_pacakges => vector
new_packages <- necessary_packages[!(necessary_packages %in%
                                       installed.packages()[, "Package"])]

# If the vector has more than 0 values, install the new pacakges
# (and their) associated dependencies:
if(length(new_packages) > 0){
  install.packages(
    new_packages, 
    dependencies = TRUE
  )
}

# Initialise the packages in the session: list of boolean => stdout (console)
lapply(
  necessary_packages, 
  require, 
  character.only = TRUE
)

# Load ticker data from 2020-01-01 till 2021-02-02
tickers <- c(
  "NKLA", 
  "MPNGF", 
  "RMO", 
  "JD", 
  "MSFT"
)

# Create a new environment: environment => symbolUpdates.env
symbolUpdates.env <- new.env()

# Create a vector of from dates: from_dates => Date Vector
from_dates <- as.Date(
  c(
    "2020-01-01", 
    "2020-02-03"
  )
)

# Create a vector of to dates:
to_dates <- as.Date(
  c(
    "2021-02-02", 
    format(
      Sys.Date(),
      "%Y-%m-%d"
    )
  )
)

# Function to retreive ticker as a data.frame: 
# retrieve_ticker_df => function()
retrieve_ticker_df <- function(ticker_vec, from_date, to_date){

  # Create a list of size length(tickers):
  # df_list => empty list
  df_list <- vector(
    "list", 
    length(ticker_vec)
  )
  
  # Store each ticker's response as a data.frame in the list:
  # df_list => list of data.frames
  df_list <- setNames(
    lapply(
      seq_along(ticker_vec),
      function(i){
        # Retrieve the data.frame: tmp => data.frame
        tmp <- getSymbols.yahoo(
          ticker_vec[i],
          auto.assign = FALSE, 
          from = from_date,
          to = to_date,
          return.class = 'data.frame',
        )
        
        # Close all Internet connections as a precaution
        # https://stackoverflow.com/a/52758758/2950721
        closeAllConnections()
        
        # Create a data.frame and revert index to sequential
        # integers: data.frame => env
        data.frame(
          cbind(
            date = as.Date(
              row.names(
                tmp
              )
            ),
            tmp
          ),
          row.names = NULL
        )
      }
    ),
    ticker_vec
  )
  # Explicitly define returned object: list of data.frames => env
  return(df_list)
}

# Store all the data.frames in a list of data.frames, 
# store each list of data.frames in a list: 
# ticker_df_list_list => list of list of data.frames
ticker_df_list_list <- lapply(
  seq_along(from_dates),
  function(i){
    retrieve_ticker_df(
      tickers, 
      from_dates[i], 
      to_dates[i]
    )
  }
)

# Initialise an empty list to create some memory:
# ir_list => empty list
ir_list <- vector(
  "list",
  length(tickers) * length(from_dates)
)

# Populate the list with each of the named data.frames: 
# ir_list => list of data.frames
ir_list <- unlist(
  ticker_df_list_list, 
  recursive = FALSE
)

# Initialise an empty list to create some memory
# bound_df_list => empty list
bound_df_list <- vector(
  "list", 
  length(tickers)
)

# Split-apply-combine: bound_df_list => list of data.frames
bound_df_list <- lapply(
  split(
    ir_list,
    names(ir_list)
  ),
  function(x){
    do.call(
      rbind, 
      x
    )
  }
)

# Clear up the intermediate objects:
rm(ticker_df_list_list, ir_list); gc()
hello_friend
  • 5,682
  • 1
  • 11
  • 15
  • Thanks @hello_friend. Both codes you posted work fine. However, the code where the solution will go loads from local storage all the tickers. Then, the codes checks those that are not up-to-date and downloads the delta between their last date and today (lets call them delta-tickers). Then I need to append the delta-tickers to the not up-to-date tickers. That is why my question is: "how to append the delta between the last date of the not up-to-date tickers and today to the not up-to-date tickers?". – pdeli Oct 09 '21 at 14:07
  • @pdeli cool, then you only need to update the from and to dates vectors to be dynamic and then you can get a delta export. – hello_friend Oct 10 '21 at 11:49
  • Thank you @hello_friend. Will try what you suggested and come back to you. – pdeli Oct 11 '21 at 19:41