dplyr, summarise categorical variable

Question

I want to summarise my data small for each different video.id using dplyr.

small %>% 
  group_by(Video.ID) %>% 
  summarise(sumr = sum(Partner.Revenue),
            len = mean(Video.Duration..sec.),
            cat = mean(Category))

mean(Category) is clearly the wrong approach. How do I get it just to use the value that is repeated several times (one video.id has always the same category no matter how often it appears in the dataframe).

My dataframe looks like this :

small

# A tibble: 6 x 7
     X1  X1_1 Video.ID    Video.Duration..sec. Category Owned.Views Partner.Revenue
  <int> <int> <chr>                      <int> <chr>          <int>           <dbl>
1     1     1 ---0zh9uzSE                 1184 gadgets            6               0
2     2     2 ---0zh9uzSE                 1184 gadgets            6               0
3     3     3 ---0zh9uzSE                 1184 gadgets            2               0
4     4     4 ---0zh9uzSE                 1184 gadgets            1               0
5     5     5 ---0zh9uzSE                 1184 gadgets            1               0
6     6     6 ---0zh9uzSE                 1184 gadgets            3               0

small <- 
  structure(list(X1 = 1:6, 
                 X1_1 = 1:6, 
                 Video.ID = c("---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE"), 
                 Video.Duration..sec. = c(1184L, 1184L, 1184L, 1184L, 1184L, 1184L), 
                 Category = c("gadgets", "gadgets", "gadgets", "gadgets", "gadgets", "gadgets"), 
                 Owned.Views = c(6L, 6L, 2L, 1L, 1L, 3L), 
                 Partner.Revenue = c(0, 0, 0, 0, 0, 0)), 
            row.names = c(NA, -6L), 
            class = c("tbl_df", "tbl", "data.frame"))

Please copy and paste the output of `dput(head(small))` into the question to make it better [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). And also share a small example of expected output. — kath, May 12 '18 at 22:05
i hope that helps, I dont know yet how to produce expected output. — hmmmbob, May 12 '18 at 22:09
@hmmmbob I'll suggest you can use count for Category. Something like for each Category how many videos. — MKR, May 12 '18 at 22:14

score 6 · Accepted Answer · answered May 12 '18 at 22:11

You have at least two options to solve this:

Add the Category column to your group_by:

small %>% 
  group_by(Video.ID, cat = Category) %>% 
  summarise(sumr = sum(Partner.Revenue),
            len = mean(Video.Duration..sec.))

# A tibble: 1 x 4
# Groups:   Video.ID [?]
#     Video.ID    cat      sumr   len
#     <chr>       <chr>   <dbl> <dbl>
#   1 ---0zh9uzSE gadgets     0  1184

Or use unique(Catregory):

small %>% 
  group_by(Video.ID) %>% 
  summarise(sumr = sum(Partner.Revenue),
            len = mean(Video.Duration..sec.),
            cat = unique(Category))

# A tibble: 1 x 4
#   Video.ID     sumr   len cat    
#   <chr>       <dbl> <dbl> <chr>  
# 1 ---0zh9uzSE     0  1184 gadgets

The first option, might be perferred, because it still works if you have multiple categories per id.

score 1 · Answer 2 · answered May 12 '18 at 22:21

Since it is a unique category for each video_id, you can have cat = Category[1], as in

small %>% group_by(Video.ID) %>% 
      summarise(sumr=sum(Partner.Revenue), len = mean(Video.Duration..sec.), 
      cat = Category[1])

# A tibble: 1 x 4
#  Video.ID     sumr   len cat    
#  <chr>       <dbl> <dbl> <chr>  
#  1 ---0zh9uzSE     0  1184 gadgets

dplyr, summarise categorical variable

2 Answers2