0

I have

DATA = data.frame(GROUP = c(1,2,1,2,1,2), TIME = c(1,1,2,2,3,3), SCORE = c(0,7,9,8,3,4))

and seek to create

WANT = data.frame(GROUP = c(1,2,3,1,2,3,1,2,3), TIME = c(1,1,1,2,2,2,3,3,3), SCORE = c(0,7,7,9,8,17,3,4,7))

Where for each value of TIME I sum up the SCORE values for all GROUP

I try without success,

WANT = DATA %>% group_by(TIME) %>% rowwise() %>% mutate(`3' = sum())
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
bvowe
  • 3,004
  • 3
  • 16
  • 33

3 Answers3

3

I suggest summarize the data and then combine (bind_rows) with the original data.

library(dplyr)
DATA %>%
  group_by(TIME) %>%
  summarize(GROUP = max(GROUP)+1, SCORE = sum(SCORE)) %>%
  bind_rows(DATA, .) %>%
  arrange(TIME, GROUP)
#   GROUP TIME SCORE
# 1     1    1     0
# 2     2    1     7
# 3     3    1     7
# 4     1    2     9
# 5     2    2     8
# 6     3    2    17
# 7     1    3     3
# 8     2    3     4
# 9     3    3     7
r2evans
  • 141,215
  • 6
  • 77
  • 149
1

I recommend the accepted answer, but for those unable or not wanting to use external packages (for whatever reason), for posterity a base R approach could use tapply and rbind/cbind (not I added a 3rd group in the original

Want <- rbind(DATA, 
              cbind(GROUP = tapply(DATA$GROUP, DATA$TIME, \(x) max(x)+1), 
                    TIME = seq_along(unique(DATA$TIME)), 
                    SCORE = tapply(DATA$SCORE, DATA$TIME, sum)))
Want <- Want[order(Want$TIME, Want$GROUP),]

#  GROUP TIME SCORE
#      1    1     0
#      2    1     7
#      3    1     7
#      1    2     9
#      2    2     8
#      3    2    17
#      1    3     3
#      2    3     4
#      3    3     7
jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • 1
    Good idea to include base R. The "summarize by group" component has other options, see https://stackoverflow.com/q/11562656/3358272, https://stackoverflow.com/q/1660124/3358272, https://stackoverflow.com/q/12064202/3358272 (multiple functions). – r2evans Aug 30 '23 at 18:29
0

Note that these only explicitly reference DATA once at the top which is consistent with the idea of pipeline flow where the input comes in at top and the output at the bottom.

1) Grouping by TIME, set last row of each group appropriately using group_modify and add_row and then rearrange column order in the select to be in the original order. The select statement can be omitted if the order does not matter and the data.frame statement can be omitted if a tibble is ok.

library(dplyr)

res <- DATA %>%
  group_by(TIME) %>%
  group_modify(~ add_row(., GROUP = max(.$GROUP) + 1, SCORE = sum(.$SCORE))) %>%  
  ungroup %>%
  select(GROUP, TIME, SCORE) %>%
  as.data.frame

identical(res, WANT)
## [1] TRUE

2) Another approach with dplyr is to use nest_by and reframe in place of group_modify and add_row.

library(dplyr)

res <- DATA %>%
  nest_by(TIME) %>%
  reframe(data = bind_rows(data, summarize(data,
    GROUP = max(GROUP) + 1, SCORE = sum(SCORE)))) %>%
  unnest(data) %>%
  select(GROUP, TIME, SCORE) %>%
  as.data.frame

identical(res, WANT)
## [1] TRUE

3) Another alternative is to do it column by column:

res <- DATA %>%
  group_by(TIME) %>%
  reframe(GROUP = c(GROUP, max(GROUP) + 1), SCORE = c(SCORE, sum(SCORE))) %>%
  select(GROUP, TIME, SCORE) %>%
  as.data.frame

identical(res, WANT)
## [1] TRUE

Update

Simplify. Add (2) and (3).

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341