Correlation matrix in panel data in R

Question

I have a time-series panel dataset which is structured in the following way:

df <- data.frame(
  year = c(2012L, 2013L, 2014L, 2012L, 2013L, 2014L),
  id = c(1L, 1L, 1L, 2L, 2L, 2L),
  c = c(11L, 13L, 13L, 16L, 15L, 15L)
)

#>   year id  c
#> 1 2012  1 11
#> 2 2013  1 13
#> 3 2014  1 13
#> 4 2012  2 16
#> 5 2013  2 15
#> 6 2014  2 15

I would like to find the cross-correlation between values in column C given their id number. Something similar to this:

#>     1  2
#> 1   1  0.8
#> 2   0.8  1

I have been using dplyr package to find the cross-correlation between two variables in my panel data but for some reason, I can't do the same for cross correlation in one veriable grouped by id.

If you can include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) it'll be easier for people to help. — austensen, Jun 03 '17 at 20:30
I'm not exactly sure how to do this, but you might want to check out [`widyr`](https://github.com/dgrtwo/widyr) — austensen, Jun 03 '17 at 20:41
When I try to implement the Henrik solution: I get an error which asks me to supply both 'x' and 'y' or a matrix-like 'x'. It might be helpful to note that in my actual problem I have around 12 different id numbers instead of 2 in this example. — Erwin Rhine, Jun 03 '17 at 21:44

score 2 · Answer 1 · answered Jun 03 '17 at 21:06

If you are already using tidyverse tools, you should try widyr.

Its functions reshape to wide, get the correlations, and give you back a tidy data frame again.

(Note I changed the sample data slightly to match akaDrHouse's answer.

df <- data.frame(
  year = c(2012L, 2013L, 2014L, 2012L, 2013L, 2014L),
  id = c(1L, 1L, 1L, 2L, 2L, 2L),
  c = c(11L, 13L, 13L, 16L, 15L, 156L)
)

df
#>   year id   c
#> 1 2012  1  11
#> 2 2013  1  13
#> 3 2014  1  13
#> 4 2012  2  16
#> 5 2013  2  15
#> 6 2014  2 156

widyr::pairwise_cor(df, id, year, c)

#> # A tibble: 2 x 3
#>   item1 item2 correlation
#>   <int> <int>       <dbl>
#> 1     2     1   0.4946525
#> 2     1     2   0.4946525

widyr::pairwise_cor(df, id, year, c, upper = FALSE)

#> # A tibble: 1 x 3
#>   item1 item2 correlation
#>   <int> <int>       <dbl>
#> 1     1     2   0.4946525

Nice functionality @austensen. I've not heard of widyr. – akaDrHouse Jun 03 '17 at 21:09 — akaDrHouse, Jun 03 '17 at 21:09

akaDrHouse · Answer 2 · 2017-06-03T20:53:46.723

Do you mean something like the following? I used the reshape package to cast based on the value of your id, followed by the cor() function in baseR.

> mydf <- data.frame(year=c("12","13","14","12","13","14"),id=c(1,1,1,2,2,2),c=c(11,13,13,16,15,156))
> library(reshape2)
> mydf
  year id   c
1   12  1  11
2   13  1  13
3   14  1  13
4   12  2  16
5   13  2  15
6   14  2 156
> my_wide_data <- dcast(mydf, year~id,value.var="c")
> cor(my_wide_data[,2:3])
          1         2
1 1.0000000 0.4946525
2 0.4946525 1.0000000

So @Henrik's comment was much more simple and elegant, so including here.

> cor(unstack(mydf[ , -1], c ~ id))
          X1        X2
X1 1.0000000 0.4946525
X2 0.4946525 1.0000000

Correlation matrix in panel data in R

2 Answers2

Linked