1

Hey I want to compute the variance of column. My dataframe is sorted by the as.Date() format. Here you can see a snippet of it:

Date       USA       ARG       BRA         CHL          COL        MEX PER
2012-04-01   1 0.2271531 0.4970299 0.001956865 0.0005341452 0.07341428  NA
2012-05-01   1 0.2218906 0.4675895 0.001911405 0.0005273186 0.07026524  NA
2012-06-01   1 0.2054076 0.4531661 0.001891352 0.0005292575 0.06897811  NA
2012-07-01   1 0.2033470 0.4596730 0.001950686 0.0005312600 0.07269619  NA
2012-08-01   1 0.1993882 0.4596039 0.001980537 0.0005271514 0.07268987  NA
2012-09-01   1 0.1967152 0.4593390 0.002011212 0.0005305549 0.07418838  NA
2012-10-01   1 0.1972730 0.4597584 0.002002203 0.0005284380 0.07428555  NA
2012-11-01   1 0.1937618 0.4519187 0.001979805 0.0005238670 0.07329656  NA
2012-12-01   1 0.1854037 0.4500448 0.001993309 0.0005323795 0.07453949  NA
2013-01-01   1 0.1866007 0.4607501 0.002013112 0.0005412329 0.07551040  NA
2013-02-01   1 0.1855950 0.4712956 0.002011067 0.0005359562 0.07554661  NA

The dataframe ranges from january 2004 up to dezember 2018. But I do not want to compute the compute the variance of the whole columnes. I want to compute the variance of one year (or 12 values) which is moving month by month.

I do not really know how to start. I can imagine using the zoo package and the rollapply. But here the problem is (I think) that R computes uses the values around it and not past it?

I also found this question: R: create a data frame out of a rolling window, so my idea was to get rid of the date column. It is easy to build the matrix, but now I do not understand how to apply the variance function to my data...

Is there a smart way to compute it all in one and also using the information of the date? If not, I also appreciate any other solution from you!

Jeremy S.
  • 79
  • 6
  • 2
    What is the expected output? Is it aggregated by year? – Cole Sep 01 '19 at 19:58
  • I expect the output to look like this (new data frame): First row: Variance from Jan-2004 to Dec-2004 Second row: Variance from Feb-2004 to Jan-2005 Last row: Variance from Jan-2018 to Dec-2018 – Jeremy S. Sep 01 '19 at 20:03
  • 2
    You forgot to post the *this* part. Anyway, try `year <- format(df1$Date, "%Y");aggregate(as.matrix(df1[-1]), list(year), FUN = var)`. – Rui Barradas Sep 01 '19 at 20:05
  • It is hard for me to explain. I do not want to aggregate it just by a specific year. I want to aggregate it by the past 12 months. After it, moving the window by one month further. – Jeremy S. Sep 01 '19 at 20:13

1 Answers1

2

We can use rollappyr to perform the rolling computations. Since there are only 11 rows in the data in the question we can't take 12 month averages but using 3 month averages instead we can illustrate it. Remove fill = NA if you want to omit the NA rows or replace it with partial = TRUE if you want variances using fewer than 12 near the beginning. If you want a data frame result use fortify.zoo(zv) .

library(zoo)

z <- read.zoo(DF)
zv <- rollapplyr(z, 3, var, fill = NA)
zv

giving this zoo object:

           USA          ARG          BRA          CHL          COL          MEX PER
2012-04-01  NA           NA           NA           NA           NA           NA  NA
2012-05-01  NA           NA           NA           NA           NA           NA  NA
2012-06-01   0 1.287083e-04 4.998008e-04 1.126781e-09 1.237524e-11 5.208793e-06  NA
2012-07-01   0 1.033001e-04 5.217420e-05 9.109406e-10 3.883996e-12 3.565057e-06  NA
2012-08-01   0 9.358558e-06 1.396497e-05 2.060928e-09 4.221043e-12 4.600220e-06  NA
2012-09-01   0 1.113297e-05 3.108380e-08 9.159058e-10 4.826929e-12 7.453672e-07  NA
2012-10-01   0 1.988357e-06 4.498977e-08 2.485889e-10 2.953403e-12 8.001948e-07  NA
2012-11-01   0 3.560373e-06 1.944961e-05 2.615387e-10 1.168389e-11 2.971477e-07  NA
2012-12-01   0 3.717777e-05 2.655440e-05 1.271886e-10 1.814869e-11 4.312436e-07  NA
2013-01-01   0 2.042867e-05 3.268476e-05 2.806455e-10 7.540331e-11 1.231438e-06  NA
2013-02-01   0 4.134729e-07 1.129013e-04 1.186146e-10 1.983651e-11 3.263780e-07  NA

We can plot the log of the variances like this:

library(ggplot2)
autoplot(log(zv), facet = NULL) + geom_point() + ylab("log(var(.))")

screenshot

Note

We assume that the starting point is the data frame generated reproducibly below:

Lines <- "Date       USA       ARG       BRA         CHL          COL        MEX PER
2012-04-01   1 0.2271531 0.4970299 0.001956865 0.0005341452 0.07341428  NA
2012-05-01   1 0.2218906 0.4675895 0.001911405 0.0005273186 0.07026524  NA
2012-06-01   1 0.2054076 0.4531661 0.001891352 0.0005292575 0.06897811  NA
2012-07-01   1 0.2033470 0.4596730 0.001950686 0.0005312600 0.07269619  NA
2012-08-01   1 0.1993882 0.4596039 0.001980537 0.0005271514 0.07268987  NA
2012-09-01   1 0.1967152 0.4593390 0.002011212 0.0005305549 0.07418838  NA
2012-10-01   1 0.1972730 0.4597584 0.002002203 0.0005284380 0.07428555  NA
2012-11-01   1 0.1937618 0.4519187 0.001979805 0.0005238670 0.07329656  NA
2012-12-01   1 0.1854037 0.4500448 0.001993309 0.0005323795 0.07453949  NA
2013-01-01   1 0.1866007 0.4607501 0.002013112 0.0005412329 0.07551040  NA
2013-02-01   1 0.1855950 0.4712956 0.002011067 0.0005359562 0.07554661  NA"
DF <- read.table(text = Lines, header = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • It worked. Your answer is really comprehensive and easy to understand. – Jeremy S. Sep 01 '19 at 20:19
  • I made some progress and have another small question about it: How do I apply a correlation function? I tried it two different ways. ``` new <- data.frame(geldbasis$Date, geldbasis$ARG,forward1[,3]) z <- read.zoo(new) zv <- rollapplyr(new, 12, cor(new$Geldbasis,new$Forward, method = "spearman")) new <- data.frame(zv) ``` & ``` zv <- rollapplyr(new$Geldbasis,new$Forward, 12, cor) ``` – Jeremy S. Sep 01 '19 at 22:06
  • 1
    Here is an example of computing the rolling correlation of ARG and BRA: `rollapplyr(z[, c("ARG", "BRA")], 3, function(x) cor(x[, 1], x[, 2]), by.column = FALSE, fill = NA)` – G. Grothendieck Sep 02 '19 at 02:15