A relatively fast approach would be to pre-define an index list and then use it on the data, set to data.table
. The whole operation should finish in under a minute for a matrix of 2000 x 700.
library(data.table)
setDT(mat)
rows <- nrow(mat)
idx <- as.matrix(rbindlist(lapply(1:(rows - 1), function(x)
rbindlist(lapply((x + 1):rows, function(y) list(x, y)))))) # takes approx. 6 secs on my crappy system for 2000 x 2000 combinations
idx
V1 V2
[1,] 1 2
[2,] 1 3
[3,] 2 3
mat[idx[, 1], ] - mat[idx[, 2], ] # takes approx. 12 secs for 700 columns, see below if there's a memory error "Error: vector memory exhausted (limit reached?)"
V1 V2 V3
1: -4 -1 1
2: 0 1 -3
3: 4 2 -4
If the data is very wide, the subtraction operation may not fit into memory due to the vectorized nature. A solution is to split the operation into smaller chunks by cycling through the indices, e.g.
rbindlist(apply(
cbind(unique(floor(c(1, seq(1, nrow(idx), length.out=10)[2:9] + 1))),
unique(floor(seq(1, nrow(idx), length.out=10)[2:10]))), 1, function(x)
mat[idx[x[1]:x[2], 1],] - mat[idx[x[1]:x[2], 2],]))
V1 V2 V3 V1 V2 V3 V1 V2 V3 V1
1: -4 -1 1 -4 -1 1 -4 -1 1 -4
2: 0 1 -3 0 1 -3 0 1 -3 0
3: 0 0 0 0 0 0 0 0 0 0
4: -4 -1 1 -4 -1 1 -4 -1 1 -4
5: 0 1 -3 0 1 -3 0 1 -3 0
---
1998996: 4 1 -1 4 1 -1 4 1 -1 4
1998997: 0 0 0 0 0 0 0 0 0 0
1998998: 0 -1 3 0 -1 3 0 -1 3 0
1998999: -4 -2 4 -4 -2 4 -4 -2 4 -4
1999000: -4 -1 1 -4 -1 1 -4 -1 1 -4
Data
mat <- structure(list(V1 = c(1L, 5L, 1L), V2 = c(2L, 3L, 1L), V3 = c(3L,
2L, 6L)), class = "data.frame", row.names = c(NA, -3L))