Complicated table calculations

Question

I have a dataframe df_workingFile:

group  | id  | flag | value |
 a     |  a  |  0   |  5    |  
 a     |  c  |  0   |  2    |  
 b     |  b  |  0   |  8    |  
 b     |  e  |  1   |  1    |

I want to sum the value column based on group, but only on the row where group = id (there can only be one of these instances by group). I also don't want to include any values in the sum that have a flag:

group  | id  | flag | value | ValueSum
 a     |  a  |  0   |  5    |   7
 a     |  c  |  0   |  2    |   0
 b     |  b  |  0   |  8    |   8
 b     |  e  |  1   |  1    |   0

Input data:

df_workingFile <- structure(list(group = structure(c(1L, 1L, 2L, 2L), .Label = c("a", 
"b"), class = "factor"), id = structure(c(1L, 3L, 2L, 4L), .Label = c("a", 
"b", "c", "e"), class = "factor"), flag = c(0, 0, 0, 1), value = c(5, 
2, 8, 1)), .Names = c("group", "id", "flag", "value"), row.names = c(NA, 
4L), class = "data.frame")

Please `dput(head( df_workingFile))` and paste into question. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — dww, Aug 16 '17 at 22:34

score 1 · Accepted Answer · answered Aug 16 '17 at 22:39

1

This is quite simple with the data.table package. Here is my solution:

library(data.table)

temp <- data.table(group = c("a","a","b","b"),
           id = c("a","c","b","d"),
           flag = c(0,0,0,1),
           value = c(5,2,8,1))

temp[flag != 1,ValueSum := ifelse(group == id,sum(value), NA), by = group]

Before the first comma, you are able to specify a logical condition. In the middle you can do column assignment (I'm sure there's a cleaner solution in this case, but it works). After the second comma, you can specify a group-by statement. I would highly recommend getting familiar with the package, it has made my life far easier.

answered Aug 16 '17 at 22:39

be_green

708
3
12

this doesn't seem to work with my dataframe. I think it's because you loaded your stuff in as a data.table, so I get an error. Any thoughts? – NBC Aug 16 '17 at 22:46
Yes, convert it to a data.table with as.data.table. – be_green Aug 16 '17 at 22:49
no this still gives errors. try it with my input data – NBC Aug 16 '17 at 22:58

score 1 · Answer 2 · answered Aug 16 '17 at 22:42

1

Using dplyr, you could do:

library(dplyr)
d %>% group_by(group) %>% mutate(valueSum=ifelse(flag==0 & group==id,sum(value),0))

  group    id  flag value valueSum
  <chr> <chr> <int> <int>    <dbl>
1     a     a     0     5        7
2     a     c     0     2        0
3     b     b     0     8        9
4     b     e     1     1        0

answered Aug 16 '17 at 22:42

Lamia

3,845
1
12
19

this gives me errors ("level sets of factors are different") - want to refer to the data I supplied? Also - not a huge deal, but the valueSum for row 3 should ideally be 8 (sum only takes place for not flagged rows) – NBC Aug 16 '17 at 22:48
That is because you have factors in your dataframe. Either precise stringsAsFactors=F when you construct or read in your dataframe. Otherwise, you can add `d$group=as.character(d$group)` and `d$id=as.character(d$id)`. – Lamia Aug 16 '17 at 22:52

lmo · Answer 3 · 2017-08-17T11:26:17.343

In base R, you could use ave with sum and rep to fill in the 0s for the final values.

df_workingFile$valueSum <-
          with(df_workingFile,
               ave(value * !flag, group, FUN=function(x) c(sum(x), rep(0, length(x)-1))))

This returns the desired results.

df_workingFile
  group id flag value valueSum
1     a  a    0     5        7
2     a  c    0     2        0
3     b  b    0     8        8
4     b  e    1     1        0

Complicated table calculations

3 Answers3