correlation failure - Pearson

Question

I want to write to datafile information about correlation as follows:

*korelacja=cor(p2,d2,method="pearson",use = "complete.obs")
korelacja2=cor(p2,d2,method="kendall",use = "complete.obs")
korelacja3=cor(p2,d2,method="spearman",use = "complete.obs")
dane=paste(korelacja,korelacja2,korelacja3,sep=';')
write(dane,file=nazwa,append=TRUE)*

Results are strange for me - Pearson correlation is very high (always equal one), but Kendall and Spearman is very low. I create scatterplots and I don't see linear correlation.

I don't think anyone is going to be able to answer this without more information about the data that you're analyzing. Can you provide a sample? It seems likely that something is going wrong, since a Pearson correlation of exactly one would also imply a Spearman correlation of one. — bnaul, Aug 26 '11 at 05:39
To be more specific: can you at least tell us the results of `str(p2)` and `str(d2)`? If `p2` and `d2` aren't too large, can you show us the results of `dput()`? See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ... — Ben Bolker, Aug 26 '11 at 14:23

Ben Bolker · Answer 1 · 2011-08-26T16:30:06.723

It's not hard to replicate this pattern if you have some large outliers in your data that dominate the Pearson correlation but are relatively insignificant in the non-parametric (Kendall/Spearman) approaches. For example, here's a concocted data set with nothing going on except for one large outlier:

> set.seed(1001)
> x <- c(runif(1000),1e5)
> y <- c(runif(1000),1e5)
> cor(x,y,method="pearson")
[1] 1
> cor(x,y,method="kendall")
[1] -0.02216583
> cor(x,y,method="spearman")
[1] -0.03335352

This is consistent with your description so far, although you ought in this case to be able to see the outliers in your scatterplots ...

correlation failure - Pearson

1 Answers1