I'm having an issue in R where I am running a cor.test on a data frame where there are multiple groups.
I am trying to obtain the correlation coefficient for one dependent variable and multiple independent variables contained in a data frame. The data frame has 2 grouping columns for subsetting the data. Here is an example:
DF <- data.frame(group1=rep(1:4,3),group2=rep(1:2,6),x=rnorm(12),v1=rnorm(12),v2=rnorm(12),v3=rnorm(12))
I created the following script that uses plyr to calculate the correlation coefficient for each of the groups and then loop through for each of the variables.
library(plyr)
group_cor <- function(DF,x,y)
{
return(data.frame(cor = cor.test(DF[,x], DF[,y])$estimate))
}
resultDF <- ddply(DF, .(group1,group2), group_cor,3,4)
for(i in 5:6){
resultDF2 <- ddply(DF, .(group1,group2), group_cor,3,i)
resultDF <- merge(resultDF,resultDF2,by=c("group1","group2"))
rm(resultDF2)
}
This works fine. The problem I'm running into is when there aren't enough values in a group to calculate the correlation coefficient. For example: when I change the data frame created above to now include a few key NA values and then try to run the same loop:
DF[c(2,6,10),5]=NA
for(i in 5:6){
resultDF2 <- ddply(DF, .(group1,group2), group_cor,3,i)
resultDF <- merge(resultDF,resultDF2,by=c("group1","group2"))
rm(resultDF2)
}
I get the following error "Error: not enough finite observations"
I understand why I get this error and am not expecting to get a correlation coefficient for these cases. But what I would like to do is to pass out a null value and move on the the next group instead of stopping my code at an error.
I've tried using a wrapper with try() but can't seem to pass that variable into my result data frame.
Any help on how to get around this would be much appreciated.