2

I have the following data frame:

y <- data.frame(group = letters[1:5], a = rnorm(5) , b = rnorm(5), c = rnorm(5), d = rnorm(5) )

How to get a data frame which gives me the correlation between columns a,b and c,d for each row?

something like: sapply(y, function(x) {cor(x[2:3],x[4:5])})

Thank you, S

user602599
  • 661
  • 11
  • 22

3 Answers3

2

You could use apply

> apply(y[,-1],1,function(x) cor(x[1:2],x[3:4]))
[1] -1 -1  1 -1 1

Or ddply (although this might be overkill, and if two rows have the same group it will do the correlation of columns a&b and c&d for both those rows):

> ddply(y,.(group),function(x) cor(c(x$a,x$b),c(x$c,x$d)))
  group V1
1     a -1
2     b -1
3     c  1
4     d -1
5     e  1
mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194
1

You can use apply to apply a function to each row (or column) of a matrix, array or data.frame.

apply(
  y[,-1], # Remove the first column, to ensure that u remains numeric
  1,      # Apply the function on each row
  function(u) cor( u[1:2], u[3:4] )
)

(With just 2 observations, the correlation can only be +1 or -1.)

Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
1

You're almost there: you just need to use apply instead of sapply, and remove unnecessary columns.

apply(y[-1], 1, function(x) cor(x[1:2], x[3:4])

Of course, the correlation between two length-2 vectors isn't very informative....

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187