0

in R I have the following dataframe:

    badge_name year.month count
1      Teacher     2009-1  2161
2      Teacher     2009-2  2193
3      Teacher     2009-3  2163
4      Teacher     2009-4  2205
5      Teacher     2009-5  3004
6      Teacher     2009-6  2865
7      Teacher     2009-7  2936
8      Teacher     2009-8  2762
9      Teacher     2009-9  2433
10     Teacher    2009-10  3001
11     Teacher    2009-11  3650
12     Teacher    2009-12  3480
13     Student     2009-1  1980
14     Student     2009-2  1933
15     Student     2009-3  2197
16     Student     2009-4  2243
17     Student     2009-5  2725
18     Student     2009-6  2904
19     Student     2009-7  3069
20     Student     2009-8  3015
21     Student     2009-9  2839
22     Student    2009-10  3603
23     Student    2009-11  4208
24     Student    2009-12  4188
...

I would like to create a new dataframe such that all of the rows are collapsed by year and the counts are summed together.

    badge_name     year   count
1      Teacher     2009   32853
2      Student     2009   34904

How would I go about doing this?

Alexander
  • 841
  • 1
  • 9
  • 23

1 Answers1

1

Assuming your data.frame df. Using dplyr:

library(dplyr)

df %>% mutate(year = substr(year.month, 1, 4)) %>%
       group_by(badge_name, year) %>% 
       summarise(count = sum(count))

With base R, you can do something like:

df$year <- substr(df$year.month, 1, 4)
with(df, tapply(count, df[,c('badge_name', 'year')], sum))
Mankind_008
  • 2,158
  • 2
  • 9
  • 15
  • Is there a way to do this without installing an external package? Also can you tell me what `%>%` does? EDIT: Found `%>%` https://stackoverflow.com/questions/24536154/what-does-mean-in-r – Alexander Jan 22 '19 at 20:34
  • 1
    added a base R alternative. – Mankind_008 Jan 22 '19 at 22:35