0

I'm having an issue in R where I am running a cor.test on a data frame where there are multiple groups.

I am trying to obtain the correlation coefficient for one dependent variable and multiple independent variables contained in a data frame. The data frame has 2 grouping columns for subsetting the data. Here is an example:

DF <- data.frame(group1=rep(1:4,3),group2=rep(1:2,6),x=rnorm(12),v1=rnorm(12),v2=rnorm(12),v3=rnorm(12))

I created the following script that uses plyr to calculate the correlation coefficient for each of the groups and then loop through for each of the variables.

library(plyr)

group_cor <- function(DF,x,y)
{
  return(data.frame(cor = cor.test(DF[,x], DF[,y])$estimate))
}

resultDF <- ddply(DF, .(group1,group2), group_cor,3,4)

for(i in 5:6){
  resultDF2 <- ddply(DF, .(group1,group2), group_cor,3,i)
  resultDF <- merge(resultDF,resultDF2,by=c("group1","group2")) 
  rm(resultDF2)
}

This works fine. The problem I'm running into is when there aren't enough values in a group to calculate the correlation coefficient. For example: when I change the data frame created above to now include a few key NA values and then try to run the same loop:

DF[c(2,6,10),5]=NA

for(i in 5:6){
  resultDF2 <- ddply(DF, .(group1,group2), group_cor,3,i)
  resultDF <- merge(resultDF,resultDF2,by=c("group1","group2")) 
  rm(resultDF2)
}

I get the following error "Error: not enough finite observations"

I understand why I get this error and am not expecting to get a correlation coefficient for these cases. But what I would like to do is to pass out a null value and move on the the next group instead of stopping my code at an error.

I've tried using a wrapper with try() but can't seem to pass that variable into my result data frame.

Any help on how to get around this would be much appreciated.

1 Answers1

1

I invariably forget to use try if I haven't use it in, oh, a day or something. This link helped me remember the basics.

For your function, you could add it in like this:

group_cor = function(DF,x,y) {
    check = try(cor.test(DF[,x], DF[,y])$estimate, silent = TRUE)
    if(class(check) != "try-error")
    return(data.frame(cor = cor.test(DF[,x], DF[,y])$estimate))
}

However, the won't return anything for the group with the error. That's actually OK if you use the all argument when you merge. Here's another way to merge, saving everything into a list with lapply and then merging with Reduce.

allcor = lapply(4:6, function(i) ddply(DF, .(group1,group2), group_cor, 3, i))

Reduce(function(...) merge(..., by = c("group1", "group2"), all = TRUE), allcor)

If you want to fill in with NA inside the function rather than waiting to fill in using merge, you could change your function to:

group_cor2 = function(DF,x,y) {
    check = try(cor.test(DF[,x], DF[,y])$estimate, silent = TRUE)
    if(class(check) == "try-error")
    return(data.frame(cor = NA))
    return(data.frame(cor = cor.test(DF[,x], DF[,y])$estimate))
}

Finally (and outside the scope of the question), depending on what you are doing with your output, you might consider naming your columns uniquely based on which columns you are doing the cor.test for so merge doesn't name them all with suffixes. There is likely a better way to do this, maybe with merge and the suffixes argument.

group_cor3 = function(DF,x,y) {
    check = try(cor.test(DF[,x], DF[,y])$estimate, silent = TRUE)
    if(class(check) != "try-error") {
    dat = data.frame(cor = cor.test(DF[,x], DF[,y])$estimate)
    names(dat) = paste("cor", x, "vs", y, sep = ".")
    dat
    }
}
Community
  • 1
  • 1
aosmith
  • 34,856
  • 9
  • 84
  • 118