R - Sum Columns with specific prefixes

Question

I have a data.table table in memory. It is made up of many rows with these columns

key, c1.min, c2.min, c3.min, c1.max, c2.max, c3.max, c1.sd, c2.sd, c3.sd

I would like to return a new table of

key, c1, c2, c3

where

c1 = c1.min + c1.max + c1.sd
c2 = c2.min + c2.max + c2.sd
c3 = c3.min + c3.max + c3.sd

Sotos, yes, fixed. @docendodiscimus, I would appreciate the help with the Syntax — wbm, Jun 06 '16 at 10:12
apologies if this is uber simple. Any kind of pointer would be appreciated — wbm, Jun 06 '16 at 10:26
Your downvoted probably because you didn't provide a reproducible example (see [mcve] and [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)) and you didn't show any efforts you made. You should be able to just do something like `dt[, list(c1 = c1.min + c1.max + c1.sd, c2 = c2.min + c2.max + c2.sd, c3 = c3.min + c3.max + c3.sd), by = key]` or similar — talat, Jun 06 '16 at 10:35
ok - fair-enough, though it is part of a bigger problem I am trying to solve. My problem is the attributes are dynamic depending on circumstances so working towards a solution that works for c1, c2, ..., cn. Looks like I will need some lambda function. I will post solution when I have it and you might give me back my 2 points... — wbm, Jun 06 '16 at 10:45
Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — zx8754, Jun 06 '16 at 10:56

score 1 · Accepted Answer · answered Jun 06 '16 at 10:46

Try this example:

library(data.table)

#dummy data
myData <- 
  data.table(
    data.frame(
      key = 1:10,
      c1.min = 1:10,
      c2.min = 10:19,
      c3.min = 100:109,
      c1.max = 1:10,
      c2.max = 1:10,
      c3.max = 1:10,
      c1.sd = 1:10,
      c2.sd = 1:10,
      c3.sd = 1:10))

# using basic regex match
cbind(key = myData$key,
      sapply(c("c1", "c2", "c3"),function(i){
        myColnames <- colnames(myData)
        rowSums(myData[, grepl(i, myColnames), with = FALSE])
      }))

# using manual sum
myData[ , list(key,
               c1 = c1.min + c1.max + c1.sd,
               c2 = c2.min + c2.max + c2.sd,
               c3 = c3.min + c3.max + c3.sd) ]

score 1 · Answer 2 · answered Jun 06 '16 at 11:35

Here is a slightly different option with melt. We specify the patterns in the measure argument, convert to 'long' format, then grouped by 'key' and specifying the .SDcols , get the sum of those columns.

melt(myData, measure = patterns("^c1", "^c2", "^c3"),
  value.name = c('c1', 'c2', 'c3'))[, lapply(.SD, sum) , key, .SDcols = c1:c3]
#    key c1 c2  c3
# 1:   1  3 12 102
# 2:   2  6 15 105
# 3:   3  9 18 108
# 4:   4 12 21 111
# 5:   5 15 24 114
# 6:   6 18 27 117
# 7:   7 21 30 120
# 8:   8 24 33 123
# 9:   9 27 36 126
#10:  10 30 39 129

score 1 · Answer 3 · answered Jun 06 '16 at 11:45

Another alternative:

library(dplyr)
library(tidyr)

myData %>%
  gather(k, v, -key) %>%
  separate(k, into = c("l", "s")) %>%
  group_by(key, l) %>% 
  summarise(value = sum(v)) %>%
  spread(l, value)

Which gives:

#Source: local data frame [10 x 4]
#Groups: key [10]
#
#     key    c1    c2    c3
#*  <int> <int> <int> <int>
#1      1     3    12   102
#2      2     6    15   105
#3      3     9    18   108
#4      4    12    21   111
#5      5    15    24   114
#6      6    18    27   117
#7      7    21    30   120
#8      8    24    33   123
#9      9    27    36   126
#10    10    30    39   129

Arun kumar mahesh · Answer 4 · 2016-06-06T12:06:02.543

You can use base packages

myData$c1 <- apply(myData[ ,c("c1.min","c1.max","c1.sd")] , 1 , sum)

myData$c2 <- apply(myData[ ,c("c2.min","c2.max","c2.sd")] , 1 , sum)

myData$c3 <- apply(myData[ ,c("c3.min","c3.max","c3.sd")] , 1 , sum)

myData <- myData[,c("key","c1","c2","c3")]

print(myData)
   key c1 c2  c3
1    1  3 12 102
2    2  6 15 105
3    3  9 18 108
4    4 12 21 111
5    5 15 24 114
6    6 18 27 117
7    7 21 30 120
8    8 24 33 123
9    9 27 36 126
10  10 30 39 129

Or you can define a function for sum columns

abc <- function(x)apply(x,1,sum)

c1 <- abc(myData[ ,c("c1.min","c1.max","c1.sd")])
c2 <- abc(myData[ ,c("c2.min","c2.max","c2.sd")])
c3 <- abc(myData[ ,c("c3.min","c3.max","c3.sd")])

mydata1 <- as.data.frame(cbind(Key=myData$key,c1,c2,c3)) 

> mydata1
   Key c1 c2  c3
1    1  3 12 102
2    2  6 15 105
3    3  9 18 108
4    4 12 21 111
5    5 15 24 114
6    6 18 27 117
7    7 21 30 120
8    8 24 33 123
9    9 27 36 126
10  10 30 39 129

R - Sum Columns with specific prefixes

4 Answers4