Calculate correlation matrx for panel data in R

Question

I have panel data set of firms:

df <- structure(list(id = c("00127264", "00127264", "00127264", "00127264", 
"00127264", "00127264", "00127264", "00127264", "00127264", "00127264", 
"00127264", "00127264", "00127264", "00127264", "00127264", "00128538", 
"00128538", "00128538", "00128538", "00128538", "00128538", "00128538", 
"00128538", "00128538", "00128538", "00129879", "00129879", "00129879", 
"00129879", "00129879", "00129879", "00129879", "00129879", "00129879", 
"00129879", "00132241", "00132241", "00132241", "00132241", "00132241", 
"00132241", "00132241", "00132241", "00132241", "00132241", "00132241", 
"00132241", "00132241", "00132241", "00132241"), time = c(2003L, 
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 
2013L, 2014L, 2015L, 2016L, 2017L, 2008L, 2009L, 2010L, 2011L, 
2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2003L, 2004L, 2005L, 
2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2003L, 2004L, 
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 
2014L, 2015L, 2016L, 2017L), sales = c(18778913, 26246705, 24577605, 
20555975, 22803119, 30493587, 47409381, 39648917, 24164698, 26667934, 
36939340, 37303488, 36095594, 47863204, 81470728, 17082948, 19218374, 
17775729, 18719393, 17682127, 17648132, 19868021, 20034845, 20291386, 
28511274, 23842198, 33364335, 38006554, 44051316, 41017519, 44559215, 
38096697, 39532944, 32250063, 20456725, 36737613, 36788480, 34432314, 
45703706, 51318203, 57966879, 57314960, 69108257, 83337772, 95862115, 
78796350, 73897366, 122529286, 114051176, 140727472), costs = c(2776879, 
6661626, 7383728, 8148280, 6965171, 15952938, 28537059, 20336344, 
8049578, 8313115, 17175621, 17864169, 17323966, 25772512, 56918048, 
13617240, 14974971, 13919060, 14317811, 13879155, 14374214, 14607183, 
14718348, 15511957, 22142396, 21523985, 30354647, 33001065, 38699618, 
35369730, 50308253, 37174212, 38743973, 28852158, 16476830, 31420842, 
30050214, 28193685, 35918673, 40847638, 45944119, 44448831, 56898404, 
70216220, 80454840, 63808983, 60155914, 106046623, 96525104, 
119211752)), row.names = c(NA, -50L), class = c("tbl_df", "tbl", 
"data.frame"))

As you can see, it has 4 columns: id, time, sales and costs. I would like to calculate correlations between sales and costs for all firms. For example, I would like to calculate correlations between sales of firm with ID 00127264 with costs of all other firms ("00128538" "00129879" "00132241"). Correlation should take into account the time dimension. The panel data set is unbalanced.

I have found similar problem and solution here: Correlation matrix in panel data in R but widyr package can calculate correlations for one value variable only:

widyr::pairwise_cor(sample, id, year, sales)

and I would need something like

widyr::pairwise_cor(sample, id, year, c(sales, costs))

which is not possible. Expected output (correlations are just some random numbers):

from to corr

127264 128538 0,54

127264 129879 0,68

127264 132241 0,78

128538 127264 0,43

128538 129879 0,48

128538 132241 0,17

129879 127264 0,57

129879 128538 0,36

129879 132241 0,89

132241 127264 0,15

132241 128538 0,6

132241 129879 0,8

or it can be a correlation matrix, as I mentioned.

Calculate correlation matrx for panel data in R

0 Answers0