I want to get a correlation matrix across several variables in tidyverse. However, I want to do this grouped by another column. E.g. suppose I have a data frame df
with columns year
and I want to see correlations across V1
, V2
, V3
by year.
year V1 V2 V3 misc_var
2018 5 6 5 a
2018 4 6 4 b
2018 3 2 3 NA
2013 5 8 2 4
2013 6 3 8 8
2013 4 7 5 NA
I tried sth. along the lines of
cor_output = df %>%
group_by(year) %>%
select(V1, V2, V3, year) %>%
cor(use = "pairwise.complete.obs")
However, instead of calculating the correlations from V1 to V3 for each year, it just adds the year
variable to the correlations.
The desired output should look like (please note the correlations in the output are made up)
year var V1 V2 V3
2013 V1 1 0.7 0.3
2013 V2 ... 1 ...
...
...
2018 V2 0.6 1 0.7
...
Any thoughts?