1

Using R:

I have a list of vectors with length n, that correspond to a vector of ids, also of length n. so there are m id's in each vector in the list. I also have a vector of values, length m:

L1 = c(1,65,23)
L2 = c(1,23,45)
L3 = c(45,23)
L4 = c(45,65)

V2 = list(L1,L2,L3,L4)

IDs = c(1, 23, 45, 65)
Values = c(400, 500, 100, 150)
dat = data.frame(IDs, Values)

I would like to subtract each value from the corresponding (by index) list. In a loop this would be something like:

testFun = function(dat){
        y = list()
        for (i in 1:nrow(dat)){
        y[[i]] = dat$Value[i] - dat$Value[which(dat$IDs %in% V2[[i]])]

        }
    return(y)
    }
testFun(dat)

Basically, this works, but does not scale well. Any help would be much appreciated! Thanks

Sarobinson
  • 39
  • 6

2 Answers2

0

An alternative approach is to keep the results in tablur form. Here is a data.table solution

# convert your data to data.table
library(data.table)
DT <- data.table(dat, key="IDs")

DT[, Values - DT[.(V2[[i]])]$Values , by=list(i=seq(nrow(DT)))]
    i   V1
 1: 1    0
 2: 1  250
 3: 1 -100
 4: 2  100
 5: 2    0
 6: 2  400
 7: 3    0
 8: 3 -400
 9: 4   50
10: 4    0
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • or `DT[, list(list(Values - DT[.(V2[[i]])]$Values)) , by=list(i=seq(nrow(DT)))]` to keep it as list. – mnel Jan 23 '14 at 00:40
  • @menl, yep good point. I think there might be advantage to using `RES[.(3), _expr_]` (where `RES` is the results output). – Ricardo Saporta Jan 23 '14 at 00:43
  • Would this scale? In the dataset I have 175,000 IDs, each assocated to a V1 vector with mean length 500 – Sarobinson Jan 23 '14 at 00:45
  • @Sarobinson, only one way to really find out :) Not only should it scale, but it should be much much faster than using `%in%`. That being said, you should benchmark this method with MNEL's suggestion. – Ricardo Saporta Jan 23 '14 at 00:47
  • OK, so the data.table solution is faster than the loop, but is still very slow (~5mins). Is there another way? mcapply maybe? – Sarobinson Jan 23 '14 at 01:50
  • 5 minutes is a long time? But yes, you can use `foreach` See here: http://stackoverflow.com/a/19205276/1492421 – Ricardo Saporta Jan 23 '14 at 03:51
0

Here is another data.table solution

DT <- data.table(dat, key = 'IDs')

DT[, col3 := vector(mode='list',length = nrow(DT))]

for (i in seq_along(V2)){
   set(DT, i = i, j = 'col3', value = list(list(DT[i,Values] - DT[.(V2[[i]])][['Values']])))
}

Note that you are creating a vector of length 175000, 175000 times. With your current data setup and the outcome you want, this will be your limiting factor for time.

mnel
  • 113,303
  • 27
  • 265
  • 254