0

I am trying to create a rowsum in a df as in example:

df1
Name    Ch1    Val1
A        a     x1
A        a     x2
A        b     x3
B        c     x4
B        c     x5
B        d     x6
...

to

df2
Name    Ch1    Val1
A        a     x1+x2
A        b     x3
B        c     x4+x5
B        d     x6
...

I want to get the rowsum of Val1 (numeric!) where Ch1 (character) is unique according to Name (character)! I know I can join Name and Ch1 then do rowsum and then spread it again but is there a more elegant way to do this?

Thanks!

antonina
  • 109
  • 8
  • 3
    So just group by the first 2 columns and take a summary stat (in this case sum)? There should be a lot of SO posts on this already; [here's one](https://stackoverflow.com/q/15933958/5325862) that links to plenty others. If these values are intended to be numeric, it's helpful for your example to reflect that – camille Dec 10 '19 at 18:36
  • 3
    Maybe `aggregate(VAL1 ~ Name + CH1, df1, sum)` ? – GKi Dec 10 '19 at 18:37
  • 1
    no because this only refers to one variable. I am struggeling with summing it according to two groups. – antonina Dec 10 '19 at 19:25
  • 1
    Several of the answers there deal with grouping by multiple columns, and others would be trivial to expand from one column to 2. Searching the R tag for ["is:question calculate sum by group"](https://stackoverflow.com/search?q=%5Br%5D+is%3Aquestion+calculate+sum+by+group) gets dozens of contenders – camille Dec 10 '19 at 20:26

2 Answers2

2

An option would be to group by both 'Name', 'Ch1', and paste the 'Val1' column values

library(stringr)
library(dplyr)
df1 %>%
   group_by(Name, Ch1) %>%
   summarise(Val1 = str_c(Val1, collapse="+"))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

data.table solution

library('data.table')

# Data
df1 <- fread('Name    Ch1    Val1
             A        a     x1
             A        a     x2
             A        b     x3
             B        c     x4
             B        c     x5
             B        d     x6')

# for demonstration purpose, assign some values to Val2
df1[, Val2 := 1:.N]
df1
#    Name Ch1 Val1 Val2
# 1:    A   a   x1    1
# 2:    A   a   x2    2
# 3:    A   b   x3    3
# 4:    B   c   x4    4
# 5:    B   c   x5    5
# 6:    B   d   x6    6

df1[, .(Val2 = sum(Val2)), by = .(Name, Ch1)]
#    Name Ch1 Val2
# 1:    A   a    3
# 2:    A   b    3
# 3:    B   c    9
# 4:    B   d    6
akrun
  • 874,273
  • 37
  • 540
  • 662
Sathish
  • 12,453
  • 3
  • 41
  • 59