How to do row-wise subtraction and replace a specific number with zero?

Question

Step 1: I have a simplified dataframe like this:

df1 = data.frame (B=c(1,0,1), C=c(1,1,0)
  , D=c(1,0,1), E=c(1,1,0), F=c(0,0,1)
  , G=c(0,1,0), H=c(0,0,1), I=c(0,1,0))

  B C D E F G H I
1 1 1 1 1 0 0 0 0
2 0 1 0 1 0 1 0 1
3 1 0 1 0 1 0 1 0

Step 2: I want to do row wise subtraction, i.e. (row1 - row2), (row1 - row3) and (row2 - row3)

row1-row2    1  0    1  0    0  -1   0  -1
row1-row3    0  1    0  1   -1   0  -1   0
row2-row3   -1  1   -1  1   -1   1  -1   1

step 3: replace all -1 to 0

row1-row2   1   0   1   0   0   0   0   0
row1-row3   0   1   0   1   0   0   0   0
row2-row3   0   1   0   1   0   1   0   1

Could you mind to teach me how to do so?

Somewhat annoyingly, I answered almost the same Q from you a few weeks ago that you accepted: http://stackoverflow.com/questions/7297505/multiply-a-data-frame-row-by-row/7298277#7298277 — Gavin Simpson, Sep 28 '11 at 12:09

score 1 · Accepted Answer · edited May 23 '17 at 12:00

1

I like using the plyr library for things like this using the combn function to generate all possible pairs of rows/columns.

require(plyr)
combos <- combn(nrow(df1), 2)

adply(combos, 2, function(x) {
  out <- data.frame(df1[x[1] , ] - df1[x[2] , ])
  out[out == -1] <- 0
  return(out)
  }
)

Results in:

  X1 B C D E F G H I
1  1 1 0 1 0 0 0 0 0
2  2 0 1 0 1 0 0 0 0
3  3 0 1 0 1 0 1 0 1

If necessary, you can drop the first column, plyr spits that out automagically for you.

Similar questions:

edited May 23 '17 at 12:00

Community

1
1

answered Sep 27 '11 at 11:37

Chase

67,710
18
144
161

thanks for your wonderful advice. This code for a test job, but since my data file is very large, the use of adply seems to be very memory demanding, it waiting time in our supercomputer facility is quite long. Could you mind to give me some more guidance how to overcome this? – psiu Sep 28 '11 at 04:57

Gavin Simpson · Answer 2 · 2011-09-28T12:34:18.207

For the record, I would do this:

cmb <- combn(seq_len(nrow(df1)), 2)
out <- df1[cmb[1,], ] - df1[cmb[2,], ]
out[out < 0] <- 0
rownames(out) <- apply(cmb, 2, 
                       function(x) paste("row", x[1], "-row", x[2], sep = ""))

This yields (the last line above is a bit of sugar, and may not be needed):

> out
          B C D E F G H I
row1-row2 1 0 1 0 0 0 0 0
row1-row3 0 1 0 1 0 0 0 0
row2-row3 0 1 0 1 0 1 0 1

Which is fully vectorised and exploits indices to extend/extract the elements of df1 required for the row-by-row operation.

NPE · Answer 3 · 2011-09-27T11:33:03.440

> df2 <- rbind(df1[1,]-df1[2,], df1[1,]-df1[3,], df1[2,]-df1[3,])
> df2
    B C  D E  F  G  H  I
1   1 0  1 0  0 -1  0 -1
2   0 1  0 1 -1  0 -1  0
21 -1 1 -1 1 -1  1 -1  1

> df2[df2==-1] <- 0
> df2
   B C D E F G H I
1  1 0 1 0 0 0 0 0
2  0 1 0 1 0 0 0 0
21 0 1 0 1 0 1 0 1

If you'd like to change the name of the rows to those in your example:

> rownames(df2) <- c('row1-row2', 'row1-row3', 'row2-row3')
> df2
          B C D E F G H I
row1-row2 1 0 1 0 0 0 0 0
row1-row3 0 1 0 1 0 0 0 0
row2-row3 0 1 0 1 0 1 0 1

Finally, if the number of rows is not known ahead of time, the following should do the trick:

df1 = data.frame (B=c(1,0,1), C=c(1,1,0), D=c(1,0,1), E=c(1,1,0), F=c(0,0,1), G=c(0,1,0), H=c(0,0,1), I=c(0,1,0))

n <- length(df1[,1])
ret <- data.frame()
for (i in 1:(n-1)) {
  for (j in (i+1):n) {
    diff <- df1[i,] - df1[j,]
    rownames(diff) <- paste('row', i, '-row', j, sep='')
    ret <- rbind(ret, diff)
  }
}
ret[ret==-1] <- 0
print(ret)

thanks for your wonderful advice. This code for a test job, but since my data file is very large, the use of for loop seems to be very memory demanding, its waiting time in our supercomputer facility is quite long. Could you mind to give me some more guidance how to overcome this? — psiu, Sep 28 '11 at 04:57

How to do row-wise subtraction and replace a specific number with zero?

3 Answers3

Linked