1

I am an R learner and need help in extracting p value of cor test using split data.

Sample data frame:

Periods     Factor 1    Factor 2
10/31/2007  76      215
10/31/2007  366     384
10/31/2007  194     186
10/31/2007  234     266
10/31/2007  365     236
9/31/2007   400     347
9/31/2007   116     197
9/31/2007   249     275
9/31/2007   132     177
9/31/2007   211     253
8/31/2007   276     67
8/31/2007   224     362
8/31/2007   161     27
8/31/2007   124     263

I created this function to get cor and p value of two factors from monthly split data

IC_cor_test <- function(x1,x2){
  corr <- cor.test(x1, x2, use='complete.obs', method = 'spearman',conf.level = 0.95,exact=FALSE)
  pvalue = corr$p.value
  cor_coef = corr$estimate
  return (c(cor_coef,pvalue))
}

split data - to compute correlation coefficient of two factors on each month

dates <- as.Date(Periods)
r <- ddply(df, "dates", function(IC_cor_test) {
  cor(IC_cor_test$ranked_factor1,IC_cor_test$ranked_factor2)
})

Result - it printed the cor coefficient but I need the corresponding p-value as well on next column.

     dates            V1
1   2007-10-31  0.2883066006
2   2007-11-30  0.0216892076
3   2007-12-31 -0.0697973283
4   2008-01-31  0.0343008730
5   2008-02-29  0.0333372672
6   2008-03-31  0.0007681072
7   2008-04-30  0.1196884915
8   2008-05-30  0.2301050604
9   2008-06-30 -0.0248823873
camille
  • 16,432
  • 18
  • 38
  • 60
Texan
  • 15
  • 5
  • Just FYI, I edited your post and removed the screenshot, since all the requisite code seems to be in the text here (as it should be). Please [see here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more on how you can post an R question that is easy for folks to answer. – camille Jun 13 '18 at 00:59
  • 2
    Your problem is not reproducible (your posted code does not generate your posted result). Please post a [minimal, complete, and verifiable example](https://stackoverflow.com/help/mcve) and I'll be happy to help! – De Novo Jun 13 '18 at 01:26
  • Though this should've used `dput()` for reproducibility, I managed to reproduce his example data and problem. Please find my code below to load the data if you need it. – Hack-R Jun 13 '18 at 01:44

3 Answers3

0

You're not actually calling your custom function in your code. Instead, you're using your function name as a temporary variable when calling the regular cor function.

What you want to do is use a variable like x, which specifies each subset of the dataframe, then call your custom function on the data like so:

dates <- as.Date(Periods)
r <- ddply(df, "dates", function(x) {
  IC_cor_test(x$ranked_factor1,x$ranked_factor2)
})
Sean Murphy
  • 1,217
  • 8
  • 15
0

Is this what you're looking for? (Note: changed variable names bc error on import that I was too lazy to fix; the columns were Periods, then Factor, then X1 so map those to your three columns):

library(magrittr)
library(dplyr)

> df %>% select(Periods,Factor,X1) %>%
         group_by(Periods) %>% 
         mutate(correl = cor.test(unlist(Factor), unlist(X1))$estimate,
         p_value = cor.test(unlist(Factor), unlist(X1))$p.value) %>% 
         select(Periods,correl,p_value) %>% distinct()
# A tibble: 3 x 3
# Groups:   Periods [3]
  Periods    correl p_value
  <fct>       <dbl>   <dbl>
1 10/31/2007  0.624 0.261  
2 9/31/2007   0.980 0.00338
3 8/31/2007  -0.142 0.858  
mysteRious
  • 4,102
  • 2
  • 16
  • 36
0

The way you entered the function into ddply was not correct syntax. You were inputting a cor() value into an invalidly specified function rather than invoking the function you created earlier.

I fixed that here and tweaked the function definition slightly.

IC_cor_test <- function(x){
  x1 <- x$Factor1
  x2 <- x$Factor2
  corr <- cor.test(x1, x2, use='complete.obs', method = 'spearman',conf.level = 0.95,exact=FALSE)
  pvalue = corr$p.value
  cor_coef = corr$estimate
  return(data.frame(cor_coef=cor_coef,pvalue=pvalue))
}

r <-  ddply(df, "dates", IC_cor_test)
      dates cor_coef     pvalue
 2007-08-31      0.0 1.00000000
 2007-09-30      0.9 0.03738607
 2007-10-31      0.8 0.10408804

Also, 9/31/07 is not a real date, so the example data was not directly usable, but I changed it to 9/30/07 and loaded your example as follows:

df <- read.table(text="Periods     Factor1    Factor2
                      '10/31/2007'  76      215
                      '10/31/2007'  366     384
                      '10/31/2007'  194     186
                      '10/31/2007'  234     266
                      '10/31/2007'  365     236
                      '9/30/2007'   400     347
                      '9/30/2007'   116     197
                      '9/30/2007'   249     275
                      '9/30/2007'   132     177
                      '9/30/2007'   211     253
                      '8/31/2007'   276     67
                      '8/31/2007'   224     362
                      '8/31/2007'   161     27
                      '8/31/2007'   124     263
",header=T)
Hack-R
  • 22,422
  • 14
  • 75
  • 131