2

why am I getting different correlations for the same combination below?

> cor(finalDB[2:6],use="complete.obs")

                rocky1Rating rocky2Rating rocky3Rating rocky4Rating rocky5Rating
rocky1Rating    1.0000000    ***0.6476523***    0.5435555    0.4964198    0.3483168

rocky2Rating    0.6476523    1.0000000    0.7507204    0.6653651    0.5288312

rocky3Rating    0.5435555    0.7507204    1.0000000    0.7284123    0.5897088

rocky4Rating    0.4964198    0.6653651    0.7284123    1.0000000    0.6006595

rocky5Rating    0.3483168    0.5288312    0.5897088    0.6006595    1.0000000
> cor(finalDB[2],finalDB[3],use = "complete.obs")

             rocky2Rating
rocky1Rating    ***0.6011554***
Marat Talipov
  • 13,064
  • 5
  • 34
  • 53
  • 2
    Can you please provide a reproducible example? See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Based on what you have my only idea is that each call might be using a different correlation method ("pearson", "kendall", "spearman"). Maybe try explicitly stating which method to use in your code. – Branden Murray Jan 31 '15 at 06:44
  • i explicitly mentioned "pearson" in both, still getting the same results.. – Varun Poddar Jan 31 '15 at 07:28

1 Answers1

4

The problem is likely NA values in your data set. When you set use="complete.obs" and you apply that to more than two columns, it only uses rows where all of those columns are not missing. If you only wanted to skip missing values for the unique pairs of columns, set use="pairwise.complete.obs". For example

set.seed(15)
mm<-matrix(runif(6*6), nrow=6)
mm[cbind(4:6, 1:3)]<-NA

cor(mm, use="complete.obs")
#              [,1]       [,2]        [,3]         [,4]       [,5]        [,6]
# [1,]  1.000000000  0.7577650  0.41079822  0.004065102 -0.9221867  0.86947546
# [2,]  0.757764997  1.0000000 -0.28363801 -0.649441771 -0.4464391  0.98119111
# [3,]  0.410798223 -0.2836380  1.00000000  0.913388689 -0.7314382 -0.09319206
# [4,]  0.004065102 -0.6494418  0.91338869  1.000000000 -0.3904905 -0.49043755
# [5,] -0.922186730 -0.4464391 -0.73143818 -0.390490510  1.0000000 -0.61077597
# [6,]  0.869475459  0.9811911 -0.09319206 -0.490437552 -0.6107760  1.00000000

cor(mm, use="pairwise.complete.obs")
#            [,1]        [,2]        [,3]       [,4]       [,5]       [,6]
# [1,]  1.0000000  0.70156571  0.50955114 -0.2663486 -0.7637746  0.7643575
# [2,]  0.7015657  1.00000000 -0.01542302 -0.2882218 -0.5666432  0.1206862
# [3,]  0.5095511 -0.01542302  1.00000000  0.8922900 -0.8904275 -0.5660903
# [4,] -0.2663486 -0.28822185  0.89229002  1.0000000 -0.4693979 -0.7574680
# [5,] -0.7637746 -0.56664323 -0.89042748 -0.4693979  1.0000000  0.2974870
# [6,]  0.7643575  0.12068622 -0.56609027 -0.7574680  0.2974870  1.0000000

cor(mm[,1], mm[,2], use="complete.obs")
# [1] 0.7015657

Notice how the last two results match up. Read the ?cor help page for more information.

MrFlick
  • 195,160
  • 17
  • 277
  • 295