0

I have a data frame that shows the number of publications by year. But I am interested just in Conference and Journals Publications. I would like to sum all other categories in Others type.

Examples of data frame:

year    type                n    
1994    Conference          2    
1994    Journal             3    
1995    Conference         10    
1995    Editorship          3    
1996    Conference         20    
1996    Editorship          2    
1996    Books and Thesis    3    

And the result would be:

year type             n    
1994    Conference    2    
1994    Journal       3    
1995    Conference   10    
1995    Other         3    
1996    Conference   20    
1996    Other         5    
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
ABueno
  • 35
  • 3
  • Possible duplicate of https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group – akrun Jun 05 '17 at 03:59
  • you are not summing the Others- bcause there are two Others. Do you simply want to rename Editorship and Books and Thesis to Others. or Do you want to sum everything post that – Ajay Ohri Jun 05 '17 at 04:13

3 Answers3

4

With dplyr we can replace anything other than "Journal" or "Conference" to "Other" and then sum them by year and type.

library(dplyr)
df %>%
  mutate(type = sub("^((Journal|Conference))", "Other", type)) %>%
  group_by(year, type) %>%
  summarise(n = sum(n))


#  year       type     n
#  <int>      <chr> <int>
#1  1994 Conference     2
#2  1994    Journal     3
#3  1995 Conference    10
#4  1995      Other     3
#5  1996 Conference    20
#6  1996      Other     5
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can use data.table

library(data.table)
library(stringr)
setDT(df1)[, .(n = sum(n)), .(year, type = str_replace(type, 
       '(Journal|Conference)', 'Other'))]
#   year             type  n
#1: 1994            Other  5
#2: 1995            Other 10
#3: 1995       Editorship  3
#4: 1996            Other 20
#5: 1996       Editorship  2
#6: 1996 Books and Thesis  3
akrun
  • 874,273
  • 37
  • 540
  • 662
0
levels(df$type)[levels(df$type) %in% c("Editorship", "Books_and_Thesis")] <- "Other"
aggregate(n ~ type + year, data=df, sum)

#         type year  n
# 1 Conference 1994  2
# 2    Journal 1994  3
# 3      Other 1995  3
# 4 Conference 1995 10
# 5      Other 1996  5
# 6 Conference 1996 20

Input data:

df <- structure(list(year = c(1994L, 1994L, 1995L, 1995L, 1996L, 1996L, 
  1996L), type = structure(c(2L, 3L, 2L, 1L, 2L, 1L, 1L), .Label = c("Other", 
  "Conference", "Journal"), class = "factor"), n = c(2L, 3L, 10L, 
  3L, 20L, 2L, 3L)), .Names = c("year", "type", "n"), row.names = c(NA, -7L), class = "data.frame")
Adam Quek
  • 6,973
  • 1
  • 17
  • 23