0

I want to get a by group (year) pair-wise correlations for a large number of observations.

When not using a for loop I get the result I want, that is:

ddply(mydata, .(year), summarise, corr=cor(x, y, use="pairwise.complete.obs"))  

The result I want:

1   1 0.8366892
2   2 0.8929666
3   3 0.8364396
4   4 0.6201038
5   5 0.8914541

But when I use a for loop to run through the columns of my data set like:

for (i in 1:length(x))
ddply(mydata, .(year), summarise, corr=cor(x[[i]], y[[i]], use="pairwise.complete.obs"))

I get:

 grp     corr
1   1 0.835378
2   2 0.835378
3   3 0.835378
4   4 0.835378
5   5 0.835378

Which is the average correlation across the different years

Am I not understanding the way ddply works?

ekad
  • 14,436
  • 26
  • 44
  • 46
  • Answering questions like this one is always easier if you supply a reproducible example, i.e. a minimal dataset required to run your code and reproduce the output. See [how to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – talat Mar 04 '15 at 17:12
  • @Elisenda If you have multiple columns (not very clear), you may need to check `rcorr` from `library(Hmisc)`. ie. `dlply(mydata, .(year), function(x) rcorr(as.matrix(x[,2:ncol(x)]), type="pearson"))` assuming that `year` column is the 1st column in your dataset – akrun Mar 04 '15 at 17:59

0 Answers0