2

Step 1: I have a simplified dataframe like this:

df1 = data.frame (B=c(1,0,1), C=c(1,1,0)
  , D=c(1,0,1), E=c(1,1,0), F=c(0,0,1)
  , G=c(0,1,0), H=c(0,0,1), I=c(0,1,0))

  B C D E F G H I
1 1 1 1 1 0 0 0 0
2 0 1 0 1 0 1 0 1
3 1 0 1 0 1 0 1 0

Step 2: I want to do row wise subtraction, i.e. (row1 - row2), (row1 - row3) and (row2 - row3)

row1-row2    1  0    1  0    0  -1   0  -1
row1-row3    0  1    0  1   -1   0  -1   0
row2-row3   -1  1   -1  1   -1   1  -1   1

step 3: replace all -1 to 0

row1-row2   1   0   1   0   0   0   0   0
row1-row3   0   1   0   1   0   0   0   0
row2-row3   0   1   0   1   0   1   0   1

Could you mind to teach me how to do so?

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
psiu
  • 615
  • 1
  • 10
  • 13
  • Somewhat annoyingly, I answered almost the same Q from you a few weeks ago that you accepted: http://stackoverflow.com/questions/7297505/multiply-a-data-frame-row-by-row/7298277#7298277 – Gavin Simpson Sep 28 '11 at 12:09

3 Answers3

1

I like using the plyr library for things like this using the combn function to generate all possible pairs of rows/columns.

require(plyr)
combos <- combn(nrow(df1), 2)

adply(combos, 2, function(x) {
  out <- data.frame(df1[x[1] , ] - df1[x[2] , ])
  out[out == -1] <- 0
  return(out)
  }
)

Results in:

  X1 B C D E F G H I
1  1 1 0 1 0 0 0 0 0
2  2 0 1 0 1 0 0 0 0
3  3 0 1 0 1 0 1 0 1

If necessary, you can drop the first column, plyr spits that out automagically for you.

Similar questions:

Community
  • 1
  • 1
Chase
  • 67,710
  • 18
  • 144
  • 161
  • thanks for your wonderful advice. This code for a test job, but since my data file is very large, the use of adply seems to be very memory demanding, it waiting time in our supercomputer facility is quite long. Could you mind to give me some more guidance how to overcome this? – psiu Sep 28 '11 at 04:57
1

For the record, I would do this:

cmb <- combn(seq_len(nrow(df1)), 2)
out <- df1[cmb[1,], ] - df1[cmb[2,], ]
out[out < 0] <- 0
rownames(out) <- apply(cmb, 2, 
                       function(x) paste("row", x[1], "-row", x[2], sep = ""))

This yields (the last line above is a bit of sugar, and may not be needed):

> out
          B C D E F G H I
row1-row2 1 0 1 0 0 0 0 0
row1-row3 0 1 0 1 0 0 0 0
row2-row3 0 1 0 1 0 1 0 1

Which is fully vectorised and exploits indices to extend/extract the elements of df1 required for the row-by-row operation.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
0
> df2 <- rbind(df1[1,]-df1[2,], df1[1,]-df1[3,], df1[2,]-df1[3,])
> df2
    B C  D E  F  G  H  I
1   1 0  1 0  0 -1  0 -1
2   0 1  0 1 -1  0 -1  0
21 -1 1 -1 1 -1  1 -1  1

> df2[df2==-1] <- 0
> df2
   B C D E F G H I
1  1 0 1 0 0 0 0 0
2  0 1 0 1 0 0 0 0
21 0 1 0 1 0 1 0 1

If you'd like to change the name of the rows to those in your example:

> rownames(df2) <- c('row1-row2', 'row1-row3', 'row2-row3')
> df2
          B C D E F G H I
row1-row2 1 0 1 0 0 0 0 0
row1-row3 0 1 0 1 0 0 0 0
row2-row3 0 1 0 1 0 1 0 1

Finally, if the number of rows is not known ahead of time, the following should do the trick:

df1 = data.frame (B=c(1,0,1), C=c(1,1,0), D=c(1,0,1), E=c(1,1,0), F=c(0,0,1), G=c(0,1,0), H=c(0,0,1), I=c(0,1,0))

n <- length(df1[,1])
ret <- data.frame()
for (i in 1:(n-1)) {
  for (j in (i+1):n) {
    diff <- df1[i,] - df1[j,]
    rownames(diff) <- paste('row', i, '-row', j, sep='')
    ret <- rbind(ret, diff)
  }
}
ret[ret==-1] <- 0
print(ret)
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • thanks for your wonderful advice. This code for a test job, but since my data file is very large, the use of for loop seems to be very memory demanding, its waiting time in our supercomputer facility is quite long. Could you mind to give me some more guidance how to overcome this? – psiu Sep 28 '11 at 04:57