I have panel data set of firms:
df <- structure(list(id = c("00127264", "00127264", "00127264", "00127264",
"00127264", "00127264", "00127264", "00127264", "00127264", "00127264",
"00127264", "00127264", "00127264", "00127264", "00127264", "00128538",
"00128538", "00128538", "00128538", "00128538", "00128538", "00128538",
"00128538", "00128538", "00128538", "00129879", "00129879", "00129879",
"00129879", "00129879", "00129879", "00129879", "00129879", "00129879",
"00129879", "00132241", "00132241", "00132241", "00132241", "00132241",
"00132241", "00132241", "00132241", "00132241", "00132241", "00132241",
"00132241", "00132241", "00132241", "00132241"), time = c(2003L,
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L,
2013L, 2014L, 2015L, 2016L, 2017L, 2008L, 2009L, 2010L, 2011L,
2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2003L, 2004L, 2005L,
2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2003L, 2004L,
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L,
2014L, 2015L, 2016L, 2017L), sales = c(18778913, 26246705, 24577605,
20555975, 22803119, 30493587, 47409381, 39648917, 24164698, 26667934,
36939340, 37303488, 36095594, 47863204, 81470728, 17082948, 19218374,
17775729, 18719393, 17682127, 17648132, 19868021, 20034845, 20291386,
28511274, 23842198, 33364335, 38006554, 44051316, 41017519, 44559215,
38096697, 39532944, 32250063, 20456725, 36737613, 36788480, 34432314,
45703706, 51318203, 57966879, 57314960, 69108257, 83337772, 95862115,
78796350, 73897366, 122529286, 114051176, 140727472), costs = c(2776879,
6661626, 7383728, 8148280, 6965171, 15952938, 28537059, 20336344,
8049578, 8313115, 17175621, 17864169, 17323966, 25772512, 56918048,
13617240, 14974971, 13919060, 14317811, 13879155, 14374214, 14607183,
14718348, 15511957, 22142396, 21523985, 30354647, 33001065, 38699618,
35369730, 50308253, 37174212, 38743973, 28852158, 16476830, 31420842,
30050214, 28193685, 35918673, 40847638, 45944119, 44448831, 56898404,
70216220, 80454840, 63808983, 60155914, 106046623, 96525104,
119211752)), row.names = c(NA, -50L), class = c("tbl_df", "tbl",
"data.frame"))
As you can see, it has 4 columns: id, time, sales and costs. I would like to calculate correlations between sales and costs for all firms. For example, I would like to calculate correlations between sales of firm with ID 00127264 with costs of all other firms ("00128538" "00129879" "00132241"). Correlation should take into account the time dimension. The panel data set is unbalanced.
I have found similar problem and solution here:
Correlation matrix in panel data in R
but widyr
package can calculate correlations for one value variable only:
widyr::pairwise_cor(sample, id, year, sales)
and I would need something like
widyr::pairwise_cor(sample, id, year, c(sales, costs))
which is not possible. Expected output (correlations are just some random numbers):
from to corr
127264 128538 0,54
127264 129879 0,68
127264 132241 0,78
128538 127264 0,43
128538 129879 0,48
128538 132241 0,17
129879 127264 0,57
129879 128538 0,36
129879 132241 0,89
132241 127264 0,15
132241 128538 0,6
132241 129879 0,8
or it can be a correlation matrix, as I mentioned.