1

I have a dataframe df_workingFile:

group  | id  | flag | value |
 a     |  a  |  0   |  5    |  
 a     |  c  |  0   |  2    |  
 b     |  b  |  0   |  8    |  
 b     |  e  |  1   |  1    |  

I want to sum the value column based on group, but only on the row where group = id (there can only be one of these instances by group). I also don't want to include any values in the sum that have a flag:

group  | id  | flag | value | ValueSum
 a     |  a  |  0   |  5    |   7
 a     |  c  |  0   |  2    |   0
 b     |  b  |  0   |  8    |   8
 b     |  e  |  1   |  1    |   0

Input data:

df_workingFile <- structure(list(group = structure(c(1L, 1L, 2L, 2L), .Label = c("a", 
"b"), class = "factor"), id = structure(c(1L, 3L, 2L, 4L), .Label = c("a", 
"b", "c", "e"), class = "factor"), flag = c(0, 0, 0, 1), value = c(5, 
2, 8, 1)), .Names = c("group", "id", "flag", "value"), row.names = c(NA, 
4L), class = "data.frame")
lmo
  • 37,904
  • 9
  • 56
  • 69
NBC
  • 1,606
  • 4
  • 18
  • 31
  • 1
    Please `dput(head( df_workingFile))` and paste into question. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – dww Aug 16 '17 at 22:34
  • sure, just did it – NBC Aug 16 '17 at 22:40

3 Answers3

1

This is quite simple with the data.table package. Here is my solution:

library(data.table)

temp <- data.table(group = c("a","a","b","b"),
           id = c("a","c","b","d"),
           flag = c(0,0,0,1),
           value = c(5,2,8,1))

temp[flag != 1,ValueSum := ifelse(group == id,sum(value), NA), by = group]

Before the first comma, you are able to specify a logical condition. In the middle you can do column assignment (I'm sure there's a cleaner solution in this case, but it works). After the second comma, you can specify a group-by statement. I would highly recommend getting familiar with the package, it has made my life far easier.

be_green
  • 708
  • 3
  • 12
1

Using dplyr, you could do:

library(dplyr)
d %>% group_by(group) %>% mutate(valueSum=ifelse(flag==0 & group==id,sum(value),0))

  group    id  flag value valueSum
  <chr> <chr> <int> <int>    <dbl>
1     a     a     0     5        7
2     a     c     0     2        0
3     b     b     0     8        9
4     b     e     1     1        0
Lamia
  • 3,845
  • 1
  • 12
  • 19
  • this gives me errors ("level sets of factors are different") - want to refer to the data I supplied? Also - not a huge deal, but the valueSum for row 3 should ideally be 8 (sum only takes place for not flagged rows) – NBC Aug 16 '17 at 22:48
  • That is because you have factors in your dataframe. Either precise stringsAsFactors=F when you construct or read in your dataframe. Otherwise, you can add `d$group=as.character(d$group)` and `d$id=as.character(d$id)`. – Lamia Aug 16 '17 at 22:52
1

In base R, you could use ave with sum and rep to fill in the 0s for the final values.

df_workingFile$valueSum <-
          with(df_workingFile,
               ave(value * !flag, group, FUN=function(x) c(sum(x), rep(0, length(x)-1))))

This returns the desired results.

df_workingFile
  group id flag value valueSum
1     a  a    0     5        7
2     a  c    0     2        0
3     b  b    0     8        8
4     b  e    1     1        0
lmo
  • 37,904
  • 9
  • 56
  • 69