6

I'm trying to get the Pearson correlation coefficient between to variables in R. This is the scatterplot of the variables:

ggplot(results_summary, aes(x =D_in, y = D_ex)) + geom_point(col=ifelse(results_summary$FDR < 0.05, ifelse(results_summary$logF>0, "red", "green" ), "black"))

enter image description here

As you can see, the variables correlate pretty well, so I'm expecting a high correlation coefficient. However when I try to get the Pearson correlation coefficient I'm getting a NaN!

> cor(results_summary$D_in, results_summary$D_ex, method="spearman")
[1] 0.868079
> cor(results_summary$D_in, results_summary$D_ex, method="kendall")
[1] 0.6973086
> cor(results_summary$D_in, results_summary$D_ex, method="pearson")
[1] NaN

I checked if my data contains any NaN:

> nrow(subset(results_summary, is.nan(results_summary$D_ex)==TRUE)) 
[1] 0
> nrow(subset(results_summary, is.nan(results_summary$D_in)==TRUE)) 
[1] 0
> cor(results_summary$D_in, results_summary$D_ex, method="pearson", use="complete.obs")
[1] NaN

But it's seems that is not the reason of the resulting NaN. Can some one give any clue about what is might happening here?

Thanks for your time!

Geparada
  • 2,898
  • 9
  • 31
  • 43

1 Answers1

4

That seems odd. My guess is that there is some problem with the input data (which was not revealed by the check you mentioned). I suggest you running:

any(!is.finite(results_summary$D_in))

any(!is.finite(results_summary$D_ex))

You could also try calculating Pearson's correlation by hand, to try to get some insight on where the problem is (in the numerator and/or denominator?):

pearson_num = cov(results_summary$D_in, results_summary$D_ex, use="complete.obs")

pearson_den = c(sd(results_summary$D_in), sd(results_summary$D_ex))

Community
  • 1
  • 1
tguzella
  • 1,441
  • 1
  • 12
  • 15
  • `> any(!is.finite(results_summary$D_in)) [1] TRUE > any(!is.finite(results_summary$D_ex)) [1] TRUE pearson_num [1] NaN pearson_den [1] NaN NaN` It is seems that you detected the problem here. Do I have "infinite" values? That's the problem right? If so, any advices how to fix it? – Geparada Aug 06 '15 at 15:48
  • To simply get rid of the values, and using essentially the command you provided in the beginning: `with(subset(results_summary, (is.finite(D_in)) & (is.finite(D_ex))), cor(D_in, D_ex, method = "pearson", use = "complete.obs"))` – tguzella Aug 06 '15 at 16:04
  • Worked perfectly! Thanks!! – Geparada Aug 07 '15 at 12:40