2

How would I go about summing the values of one column for all rows containing a name that is part of a target group, as well as a specific year which is in another column?

Example: I would like to sum the values of a, b, c for 2015 to make a new Category, "e", and to do the same for 2016.

Year Category Value
2015 a        2
2015 b        3
2015 c        2
2015 d        1
2016 a        7
2016 b        2
2016 c        1
2016 d        1

To give something like this:

Year Category Value
2015 d        1
2015 e        7
2016 d        1
2016 e        10

Thanks!

Fragilaria
  • 159
  • 7
  • related: https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family – jogo Oct 25 '18 at 14:32
  • 1
    `aggregate(Value ~ Year + I(c("d", "e")[Category %in% c("a", "b", "c") + 1]), data=df1, FUN=sum )` :D – Andre Elrico Oct 25 '18 at 14:33

2 Answers2

1

Try aggregate, defining first a group of categories.

target <- c("a", "b", "c")
group <- factor(dat$Category %in% target,
                levels = c(TRUE, FALSE),
                labels = c("e", "d"))
agg <- aggregate(Value ~ group + Year, dat, sum)[c(2, 1, 3)]

agg
#  Year group Value
#1 2015     e     7
#2 2015     d     1
#3 2016     e    10
#4 2016     d     1

Edit.

If you have many categories and want to collapse some of them while leaving the others as they are, CRAN package forcats function fct_collapse is a good way to do it.

group <- forcats::fct_collapse(dat$Category,
                      "e" = target)
group
#[1] e e e d e e e d
#Levels: e d

Then aggregate as above.

Data.

dat <-
structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 
2016L, 2016L), Category = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), Value = c(2L, 
3L, 2L, 1L, 7L, 2L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-8L))
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thank you! Now I've realized the importance of being specific. This works well, but I have 15 different "Categories". How can I sum 5 of those, while leaving the remaining ones alone? – Fragilaria Oct 25 '18 at 14:42
1

Here is a more compact dplyr option

dat %>%
        mutate(Category = ifelse(Category %in% c("a", "b", "c"), "e", # put in c() the Categories you want to sum
                                 as.character(Category))) %>%
        group_by(Year, Category) %>%
        summarise(Value = sum(Value))
# A tibble: 4 x 3
# Groups:   Year [?]
   Year Category Value
  <int>    <chr> <int>
1  2015        d     1
2  2015        e     7
3  2016        d     1
4  2016        e    10
nghauran
  • 6,648
  • 2
  • 20
  • 29