1

I an new to R and have a quick doubt (have gone through a lot of questions on stack-overflow but to no avail).

I have created a function (as can be seen in my code) where x and y are dates and $z_{1} to z_{9}$ are data-frames. The function goes through the 9 files subsets the data depending on the given dates and returns a merged data-set.

DATE1_May <- as.Date("2017-11-16")
DATE2_May <- as.Date("2018-02-15")

myfunc1 <- function(x,y,z1,z2,z3,z4,z5,z6,z7,z8,z9){
  a1 <- z1[z1$Date >= x & z1$Date <= y,]
  b1 <- a1[c(1,2)]
  b1 <- data.frame(b1)
  a2 <- z2[z2$Date >= x & z2$Date <= y,]
  b2 <- a2[c(1,2)]
  b2 <- data.frame(b2)
  a3 <- z3[z3$Date >= x & z3$Date <= y,]
  b3 <- a3[c(1,2)]
  b3 <- data.frame(b3)
  a4 <- z4[z4$Date >= x & z4$Date <= y,]
  b4 <- a4[c(1,2)]
  b4 <- data.frame(b4)
  a5 <- z5[z5$Date >= x & z5$Date <= y,]
  b5 <- a5[c(1,2)]
  b5 <- data.frame(b5)
  a6 <- z6[z6$Date >= x & z6$Date <= y,]
  b6 <- a6[c(1,2)]
  b6 <- data.frame(b6)
  a7 <- z7[z7$Date >= x & z7$Date <= y,]
  b7 <- a7[c(1,2)]
  b7 <- data.frame(b7)
  a8 <- z8[z8$Date >= x & z8$Date <= y,]
  b8 <- a8[c(1,2)]
  b8 <- data.frame(b8)
  a9 <- z9[z9$Date >= x & z9$Date <= y,]
  b9 <- a9[c(1,2)]
  b9 <- data.frame(b9)
  fin1 <- Reduce(function(x, y) merge(x, y, all=T, by=c("Date")), list(b1,b2,b3,b4,b5,b6,b7,b8,b9))
  }
Testx1 <- myfunc1(DATE1_May,DATE2_May, May18,July18, September18, December18,March19, May19, July19, September19, December19)    

I have 2 questions:

  1. I have written this code for a March18 futures contract. I want to do a similar thing with March17 contract but in that case $z_{1} to z_{9}$ will be from May17 to December18. And, the dates would be:

    DATE1_May <- as.Date("2016-11-16")
    DATE2_May <- as.Date("2017-02-15")
    

    I was trying to create a for loop and use assign command. However, I am not certain how to do so. Is there a way to automate this? (Right now, I am creating separate functions but it's taking a lot of time since I have to do this for over 100 contracts.)

  2. Is there a way to shorten the function (It works perfectly fine though).

Parfait
  • 104,375
  • 17
  • 94
  • 125
Raghav Goyal
  • 51
  • 2
  • 8
  • 1
    Please add a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). That way you can help others to help you! – dario Feb 09 '20 at 15:30
  • Ok, let me try that, – Raghav Goyal Feb 09 '20 at 16:28

2 Answers2

0

It's difficult without a sample of what your data frames look like, but I would recommend working with the dpylr and purrr packages from the tidyverse.

Here you would iterate over each data frame in a list, filtering each data frame for dates between start_date and end_date. Lastly, you can use reduce (as before) to join each data frame together. Reduce applies a function to each element in the list successively, in this case a full_join, which would keep all rows from the data frames being joined.

This could be written with a set of intermediate variables or using the %>% operator for very clean code.

If you need to perform these operations often, I would recommend wrapping these steps into a function.

library(tidyverse)

start_date <- as.Date("2017-11-16")
end_date <- as.Date("2018-02-15")

my_dfs <- list(z1, z2, z3, z4, z5, z6, z7, z8, z9)
my_dfs_filtered <- map(my_dfs, ~filter(.x, Date >= start_date & Date <= end_date))
my_dfs_joined <- reduce(my_dfs_filtered, full_join, by = "Date")

# as pipe
start_date <- as.Date("2017-11-16")
end_date <- as.Date("2018-02-15")

list(z1, z2, z3, z4, z5, z6, z7, z8, z9) %>% 
  map(~filter(.x, Date >= start_date & Date <= end_date)) %>% 
  reduce(full_join, by = "Date")

Carsten Stann
  • 44
  • 1
  • 4
0

Consider generalizing your process of the repetitive code to build a list of data frames with lapply using ... for dynamic parameters of any length. Then run chain merge with Reduce, all using base R:

df_build <- function(x, y, ...) { 
  df_list <- lapply(..., function(df)
      # ROW AND COLUMN INDEXING
      df[df$Date >= x & df$Date <= y, c(1,2)] 
  )

  # CHAIN MERGE FULL JOIN
  merged_df <- Reduce(function(x, y) merge(x, y, all=TRUE, by=c("Date")), 
                      df_list)      
}

# MAY 2018 FUTURES
DATE1_May <- as.Date("2017-11-16") 
DATE2_May <- as.Date("2018-02-15") 

may_2018_df <- df_build(DATE1_May, DATE2_May, 
                        May18, July18, September18, 
                        December18, March19, May19, 
                        July19, September19, December19)  

# MAY 2017 FUTURES
DATE1_May <- as.Date("2016-11-16") 
DATE2_May <- as.Date("2017-02-15")

may_2017_df <- df_build(DATE1_May, DATE2_May, 
                        May17, July17, September17, 
                        December17, March18, May18, 
                        July18, September18, December18)  

There may even be a dynamic way to build a list of May futures data frames using get and paste0 to refer to objects dynamically by string. Below builds from 2010 to 2018 and uses above df_build(). Adjust as needed.

may_futures_list <- lapply(c(2010:2018), function(yr) {
    DATE1_May <- as.Date(paste0(yr-1, "-11-16"))
    DATE2_May <- as.Date(paste0(yr, "-02-15"))

    may_df <- df_build(DATE1_May, DATE2_May, 
                       get(paste0("May", yr)), 
                       get(paste0("July", yr)),
                       get(paste0("September", yr)), 
                       get(paste0("December", yr)), 
                       get(paste0("March", yr+1)),
                       get(paste0("May", yr+1)),
                       get(paste0("July", yr+1)),
                       get(paste0("September", yr+1)), 
                       get(paste0("December", yr+1))
               )
})

# RENAME LIST ELEMENTS
may_futures_list <- setNames(may_futures_list,
                             as.character(c(2010:2018))
                    )

# RETRIEVE INDIVIDUALS DATA FRAMES
may_futures_list$`2018`
may_futures_list$`2017`
may_futures_list$`2016`
...
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • I was trying this code; however, I get an error ```Error in match.fun(FUN) : 'get(paste0("July", yr))' is not a function, character or symbol ``` – Raghav Goyal Feb 09 '20 at 21:38
  • Possibly the object `JulyXX` does not exist in your global environment. If running through many years, check accordingly. This method assumes such data frame objects exist *before* it runs. – Parfait Feb 09 '20 at 22:45
  • BTW - it is not advised to carry hundreds of separate objects in global environment. Instead, use named lists (of many underlying, similar structured elements) as this solution shows. – Parfait Feb 09 '20 at 22:46