1

I have multiple data.frame objects of unequal lengths. I would like to find the most recent date in all of them and store the data somewhere.

Here is an example of hopefully reproducible code to illustrate what I would like (with comments and sources). This gives 7 data.frame objects of variable lengths:

library(quantmod)

# Load ticker data from 2020-01-01 till 2021-02-02
tickers <- c("NKLA", "MPNGF", "RMO", "JD", "COIN")
getSymbols.yahoo(tickers, auto.assign = TRUE, env = globalenv(), from = "2020-01-01", to = "2021-02-02")

# Load ticker data from 2020-01-01 till yesterday (if not weekend or holiday)
tickers2 <- c("IBM", "AAPL", "MRNA")
getSymbols.yahoo(tickers2, auto.assign = TRUE, env = globalenv(), from = "2020-01-01")

# Close all Internet connections as a precaution
# https://stackoverflow.com/a/52758758/2950721
closeAllConnections()

# Find xts objects
xtsObjects <- names(which(unlist(eapply(.GlobalEnv, is.xts))))

# Convert xts to data.frame
# https://stackoverflow.com/a/69246047/2950721
for (i in seq_along(xtsObjects)) {
  assign(xtsObjects[i], fortify.zoo(get(xtsObjects[i])))
}

# 1st column name from Index to Date
# https://stackoverflow.com/a/69292036/2950721
for (i in seq_along(xtsObjects)) {
  tmp <- get(xtsObjects[i])
  colnames(tmp)[colnames(tmp) == "Index"] <- "Date"
  assign(xtsObjects[i], tmp)
}
remove(tmp)

Individually retreive the dates is pretty straightforward:

max(AAPL$Date)
max(IBM$Date)
max(JD$Date)
max(MPNGF$Date)
max(MRNA$Date)
max(NKLA$Date)
max(RMO$Date)

But when I try the following codes none of them would render or, better yet, store the most recent dates with corresponding origine (i.e., ticker):

dataframeObjects <- names(which(unlist(eapply(.GlobalEnv, is.data.frame))))

# Tentative 1    
for (i in seq_along(dataframeObjects)) {
  mostRecentDates <- max(dataframeObjects[i]$Date)
}

# Tentative 2
for (i in 1:length(dataframeObjects)) {
  mostRecentDates <- max(dataframeObjects[i]["Date"])
}

Both tentatives give a [1]NA when invoking variable mostRecentDates.

Important: In the final code there won't be any tickers and tickers2 variables. There will be a certain quantity of data.frame objects that will be loaded locally and it is those that will be searched for the last date available.

My question:

  • What code is needed in order to store the most recent dates of all data.frame objects (if possible by invoking dataframeObjects, but not tickers and tickers2)?

Thanks in advance.


Systems used:

  • R version: 4.1.1 (2021-08-10)
  • RStudio version: 1.4.1717
  • OS: macOS Catalina version 10.15.7 and macOS Big Sur version 11.6
pdeli
  • 436
  • 3
  • 13

2 Answers2

2

We may get the objects from intersect of object names in ls and the ticker objects, use mget to get the value of objects in a list, loop over the list with lapply, extract the 'Date' column and get the max

do.call(c, lapply(mget(intersect(c(tickers, tickers2), ls())), 
       function(x) max(x$Date)))

-output

   NKLA        MPNGF          RMO           JD          IBM         AAPL         MRNA 
"2021-02-01" "2021-02-01" "2021-02-01" "2021-02-01" "2021-09-28" "2021-09-28" "2021-09-28" 

Update

If the objects in the ls() are only from the one created with dataframeObjects, objects, then do

do.call(c, lapply(mget(dataframeObjects), function(x) max(x$Date)))

In the OP's code, the dataframeObjects are just names of objects. We need get in the loop to return the value

# // in case there are other data.frame objects as well, get the intersect
nm1 <- intersect(dataframeObjects, c(tickers, tickers2))
# // create a `list` to store the output
out <- vector('list', length(nm1))
names(out) <- nm1
for(i in seq_along(nm1)) {
   out[[i]] <- max(get(nm1[i])$Date)
}

-output

> out
$RMO
[1] "2021-02-01"

$NKLA
[1] "2021-02-01"

$JD
[1] "2021-02-01"

$AAPL
[1] "2021-09-28"

$IBM
[1] "2021-09-28"

$MRNA
[1] "2021-09-28"

$MPNGF
[1] "2021-02-01"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for the code Akrun. However, what I would have liked is something that does not involve using variables ```tickers``` and ```tickers2```. What I gave was an example. In fact there will be the ```dataframeObjects <- names(which(unlist(eapply(.GlobalEnv, is.data.frame))))``` line. In other words I wanted to invoke the ```dataframeObjects``` variable. – pdeli Sep 30 '21 at 14:40
  • @pdeli The issue with `is.data.frame` on .`GlobalEnv` is that if you have already created another data.frame objects unrelated to the tickers, it will also be found. Instead you may create a new env and put all these objects created there – akrun Sep 30 '21 at 16:05
  • Understood. However, in the code where I am going to use the solution I am seeking here, ```data.frame``` objects are loaded locally and those are the ones that will need an update. Those that will be downloaded from the Internet will automatically be up-to-date. Also I am very careful not to create any ```data.frame``` objects before the updates are done. In light of this info, how should your code be tweaked in order to invoke only ```is.data.frame``` and avoid invoking variables ```tickers``` and ```tickers2```? – pdeli Oct 01 '21 at 11:14
  • @pdeli IN that case, just change the code to `do.call(c, lapply(mget(ls()), function(x) max(x$Date)))` – akrun Oct 01 '21 at 15:52
  • Excellent, works perfectly! You mind updating your answer please? – pdeli Oct 01 '21 at 17:01
  • @pdeli updated the post – akrun Oct 01 '21 at 17:04
  • Sorry to bother you again, but your solution worked and now it doesn't anymore. When I run ```do.call(c, lapply(mget(ls()), function(x) max(x$Date)))``` it gives me an ```Error: $ operator is invalid for atomic vectors```. Any ideas? – pdeli Oct 01 '21 at 17:19
  • @pdeli It is because you haven't converted to xts objects back to data.frame. i.e. in your previous code, you used `fortify.zoo` for conversion – akrun Oct 01 '21 at 17:20
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/237710/discussion-between-pdeli-and-akrun). – pdeli Oct 01 '21 at 17:23
  • @pdeli Have you tried the step `for (i in seq_along(xtsObjects)) { assign(xtsObjects[i], fortify.zoo(get(xtsObjects[i]))) }` – akrun Oct 01 '21 at 17:25
  • 1
    For completion's sake the piece of code that finally worked for me based on akrun's answers and the initial question: ```do.call(c, lapply(mget(dataframeObjects), function(x) max(x$Date)))``` – pdeli Oct 03 '21 at 15:59
2

I recommend to store the xts objects in another environment than the global one, that makes it much easier to handle them. We can turn that environment into a list and then we can iterate over that list with purrr::map() or base::lapply().

Here is what that can look like for your example.

library(quantmod)
library(tidyverse)
sym_env <- new.env()

tickers <- c("NKLA", "MPNGF", "RMO", "JD", "COIN")
getSymbols.yahoo(tickers, auto.assign = TRUE, env = sym_env, from = "2020-01-01", to = "2021-02-02")

tickers2 <- c("IBM", "AAPL", "MRNA")
getSymbols.yahoo(tickers2, auto.assign = TRUE, env = sym_env, from = "2020-01-01")

closeAllConnections()

as.list(sym_env) |> 
  map(fortify.zoo) |> 
  map(\(x) rename(x, Date=Index)) |> 
  map(\(x) max(x$Date))

Returns:

$RMO
[1] "2021-02-01"

$NKLA
[1] "2021-02-01"

$JD
[1] "2021-02-01"

$AAPL
[1] "2021-09-28"

$IBM
[1] "2021-09-28"

$MRNA
[1] "2021-09-28"

$MPNGF
[1] "2021-02-01"

In general, it is advisable to organize data objects that are supposed to be processed with the same function(s) in a list instead of having them mixed into the global environment. Therefore you should choose a method for obtaining the data that returns a list.

You could use any other strategy to obtain a list of xts objects and then feed that in to the chain of purrr::map() commands.

list_of_xts_objects |> 
  map(fortify.zoo) |> 
  map(\(x) rename(x, Date=Index)) |> 
  map(\(x) max(x$Date))
Till
  • 3,845
  • 1
  • 11
  • 18
  • Thank you for your answer @Till. Would it be possible to leave the ```data.frame``` objects and the ```as.list``` object in the current environment or is it compulsory to create a new one? – pdeli Sep 30 '21 at 14:45
  • No, it is not compulsory. As long as you have a list object that contains all xts objects you can use this. Please see my updated answer. – Till Sep 30 '21 at 20:56
  • Excellent. Thanks. To clarify, this is the sequence in which things would happen: 1. Load locally all existing ```data.frame``` objects (in which previously downloaded tickers are, thus probably not up-to-date) in ```.GlobalEnv```; 2. Scan all the loaded ```data.frame``` objects for last date (here is where the present code would be inserted); 3. Update all ```data.frame``` objects which most recent date is older than yesterday. Knowing this, how could your code be tweaked so that it scans ```data.frame``` objects instead of ```xts``` objects from ```.GlobalEnv```? – pdeli Oct 01 '21 at 11:33
  • Maybe I should add that under point 1. ```data.frame``` objects already have the ```Index``` columns renamed to ```Date```. – pdeli Oct 01 '21 at 12:06