3

I am trying to create a series of variables, c1, c2, and c3, based on the values of two sets of variables, a1, a2, and a3, and b1, b2, and b3. The code below shows a hard-coded solution, but in reality I don't know the total number of set of variables, say an and bn. As you can see the name of the c variables depend on the names of the a and b variables.

Is there a way in data.table to do this? I tried to do it by using purrr::map2 within data.table but I could not make it work. I would highly appreciate your help.

Thanks.

library(data.table)

DT <- data.table(
  a1 = c(1, 2, 3),
  a2 = c(1, 2, 3)*2,
  a3 = c(1, 2, 3)*3,
  b1 = c(5, 6, 7),
  b2 = c(5, 6, 7)*4,
  b3 = c(5, 6, 7)*5
)
DT[]
#>    a1 a2 a3 b1 b2 b3
#> 1:  1  2  3  5 20 25
#> 2:  2  4  6  6 24 30
#> 3:  3  6  9  7 28 35

DT[,
   `:=`(
     c1 = a1 + b1,
     c2 = a2 + b2,
     c3 = a3 + b3
   )
   ]
DT[]
#>    a1 a2 a3 b1 b2 b3 c1 c2 c3
#> 1:  1  2  3  5 20 25  6 22 28
#> 2:  2  4  6  6 24 30  8 28 36
#> 3:  3  6  9  7 28 35 10 34 44

Created on 2020-08-26 by the reprex package (v0.3.0)

4 Answers4

4

This first part is mostly defensive, guarding against: a* variables without matching b* variables; vice versa; and different order of each:

anames <- grep("^a[0-9]+$", colnames(DT), value = TRUE)
bnames <- grep("^b[0-9]+$", colnames(DT), value = TRUE)
numnames <- gsub("^a", "", anames)
anames <- sort(anames[gsub("^a", "", anames) %in% numnames])
bnames <- sort(bnames[gsub("^b", "", bnames) %in% numnames])
cnames <- gsub("^b", "c", bnames)

If you know the number ranges a priori and want something less-dynamic but more straight-forward, then

anames <- paste0("a", 1:3)
bnames <- paste0("b", 1:3)
cnames <- paste0("c", 1:3)

Now the magic:

DT[, (cnames) := Map(`+`, mget(anames), mget(bnames)) ]
DT
#    a1 a2 a3 b1 b2 b3 c1 c2 c3
# 1:  1  2  3  5 20 25  6 22 28
# 2:  2  4  6  6 24 30  8 28 36
# 3:  3  6  9  7 28 35 10 34 44
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 2
    In a similar vein: `DT[ , paste0("c", seq(ncol(DT)/2)) := Map(`+`, mget(ls(pattern = "a")), mget(ls(pattern = "b")))]` – Henrik Aug 26 '20 at 21:19
4

You could tackle this issue if you split DT column-wise by the pattern of the names first, and then aggregate it

# removes numbers from col names
(ptn <- sub("\\d", "", names(DT))) 
# [1] "a" "a" "a" "b" "b" "b"

# get unique numbers contained in the col names (as strings but it doesn't matter here)
(nmb <- unique(sub("\\D", "", names(DT))))
# [1] "1" "2" "3"

Next step is to split DT and finally do the aggregation

DT[, paste0("c", nmb) := do.call(`+`, split.default(DT, f = ptn))]

Result

DT
#   a1 a2 a3 b1 b2 b3 c1 c2 c3
#1:  1  2  3  5 20 25  6 22 28
#2:  2  4  6  6 24 30  8 28 36
#3:  3  6  9  7 28 35 10 34 44
markus
  • 25,843
  • 5
  • 39
  • 58
2

We can melt to long format, create the column 'c', dcast into 'wide' format and then cbind

library(data.table)
cbind(DT, dcast(melt(DT, measure = patterns('^a', '^b'))[,
    c := value1 + value2], rowid(variable) ~ paste0('c', variable),
        value.var = 'c')[, variable := NULL])
#   a1 a2 a3 b1 b2 b3 c1 c2 c3
#1:  1  2  3  5 20 25  6 22 28
#2:  2  4  6  6 24 30  8 28 36
#3:  3  6  9  7 28 35 10 34 44
akrun
  • 874,273
  • 37
  • 540
  • 662
2

A base R option

u<-split.default(DT,gsub("\\D","",names(DT)))
cbind(DT,do.call(cbind,Map(rowSums,setNames(u,paste0("c",names(u))))))

which gives

   a1 a2 a3 b1 b2 b3 c1 c2 c3
1:  1  2  3  5 20 25  6 22 28
2:  2  4  6  6 24 30  8 28 36
3:  3  6  9  7 28 35 10 34 44
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81