I want to calculate a Pearson correlation between several columns. The solution JasonAizkalns posted in this thread is very useful for me.
df %>%
select_if(is.numeric) %>%
group_by(year) %>%
group_map(~ correlate(.x))
Now I'm wondering two things:
- How can I get p-Values?
- Why are some correlation coefficients marked in red? I have not found anything about it in the documentation. Are these already the significant correlations? If yes, which significance level is used?
I am searching for an extension as simple as possible, without having to use a completely different method.
Thanks for any tips!
Edit 1 (11/28/22): Because my grouping variable ("trainingsmodus") is a character variable and I get the following error message, I have adapted my code.
Error in
group_by()
: ! Must group by variables found in.data
. ✖ Columntrainingsmodus
is not found. Backtrace:
- ... %>% ...
- dplyr:::group_by.data.frame(., trainingsmodus)
My adapted code:
df %>%
select_if(is.character) %>%
group_by(year) %>%
group_map(~ correlate(.x)) %>%
add_column(year)
Even if I create the grouping variable as a numeric variable, the results of both groups are exactly identical, and this makes no sense. Does anyone have a tip on how I can correct the code?
Edit 2 (11/28/22) Repro of my df and the code:
df <- data.frame(year = c("lorem", "ipsum", "lorem", "ipsum"),
var1 = 4:7,
var2 = 5:8,
var3 = 6:9,
var4 = 7:10)
library(rstatix)
df %>%
select_if(is.character) %>%
group_by(year) %>%
group_map(~ cor_test(df,
vars = c("var1", "var2", "var3", "var4"),
vars2 = c("var1", "var2", "var3", "var4") %>%
filter(is.finite(statistic)))