3

I have a data frame that is 18x18 and I would like to compare all possible pairs of columsn with each other, so that for each pair of two columns, the values in the 18 rows are compared to each other.

Since my data is too big to put it here, I wrote a small example of what I have come up with so far:

> a <- c(1:18)
> b <- c(18:1)
> c <- c(1:9,18:10)
data <- as.data.frame(matrix(c(a,b,c), ncol = 3, nrow = 18))
> data
   V1 V2 V3
1   1 18  1
2   2 17  2
3   3 16  3
4   4 15  4
5   5 14  5
6   6 13  6
7   7 12  7
8   8 11  8
9   9 10  9
10 10  9 18
11 11  8 17
12 12  7 16
13 13  6 15
14 14  5 14
15 15  4 13
16 16  3 12
17 17  2 11
18 18  1 10

Say, I would like to compare col V1 with V3 and equal values in both columns (V1 and V3) are assigned with a 0, when the value of the first col (V1) is greater a 1 is assigned and when the value of the second col (V3) is greater, a 2 is assigned. I can do this manually for each pair with the following code, converting the results to a new data frame freqcomp:

> freqcomp <- as.data.frame(table(ifelse(data[,1]==data[,3],0,ifelse(data[,1]>data[,3],1,ifelse(data[,1]<data[,3],2,NA)))))
> 
> freqcomp
  Var1 Freq
1    0   10
2    1    4
3    2    4

How could I automatize this comparison for all the columns I have? Is there a nice for-loop to run over all columns or any other function I could use?

jaspb
  • 91
  • 1
  • 8

1 Answers1

3

You need to use combn() and apply():

apply(combn(1:length(data), 2), 2, function(x) {
    as.data.frame(table(
        factor(sign(data[,x[1]] - data[,x[2]]), levels=c(0,1,-1), labels=c(0,1,2))
    ))
})

(Split in multiple lines for readability).

It gives me:

[[1]]
  Var1 Freq
1    0    0
2    1    9
3    2    9

[[2]]
  Var1 Freq
1    0   10
2    1    4
3    2    4

[[3]]
  Var1 Freq
1    0    0
2    1    9
3    2    9

EDIT: Naming the column for each data.frame is easy:

apply(combn(1:length(data), 2), 2, function(x) {
    result <- as.data.frame(table(
        factor(sign(data[,x[1]] - data[,x[2]]), levels=c(0,1,-1), labels=c(0,1,2))
    ))
    colnames(result)[1] <- paste(x, collapse="|")
    return(result)
})
Theodore Lytras
  • 3,955
  • 1
  • 18
  • 25
  • thanks a lot! that works like a charm. How would I change the Var1 columnname to a a combination of the two columns that are being compared? or a substring of that? or even just the number of the column in the data frame? – jaspb Mar 12 '13 at 22:12