-2

I have a dataset of baseball play-by-play data. Here's a simplified example:

team <- c('A','A','A','A','A','A','A',
      'B','B','B','B','B','B','B',
      'C','C','C','C','C','C','C')
event <- c("OUT","WALK","OUT","OUT","HR","WALK","OUT",
       "WALK","OUT","HR","WALK","OUT","OUT","WALK",
        "HR","HR","WALK","WALK","HR","OUT","WALK")
df <- data.frame(team, event)
df

   team event
1     A   OUT
2     A  WALK
3     A   OUT
4     A   OUT
5     A    HR
6     A  WALK
7     A   OUT
8     B  WALK
9     B   OUT
10    B    HR
11    B  WALK
12    B   OUT
13    B   OUT
14    B  WALK
15    C    HR
16    C    HR
17    C  WALK
18    C  WALK
19    C    HR
20    C   OUT
21    C  WALK

I would like to create a new data frame that shows the number of times each event occurred for each team, with each event represented by a new column, like this:

  team OUT WALK HR
1    A   4    2  1
2    B   3    3  1
3    C   1    3  3

I think there must be a way to do this using dplyr, but I haven't been able to figure it out.

mr_gasbag
  • 7
  • 3

1 Answers1

1

We can try with dplyr/tidyr. Get the count based on 'team', 'event', and spread from 'long' to 'wide'

library(tidyverse)
df %>%
   count(team, event) %>% 
   spread(event, n)
# A tibble: 3 × 4
#    team    HR   OUT  WALK
#* <fctr> <int> <int> <int>
#1      A     1     4     2
#2      B     1     3     3
#3      C     3     1     3

If we need the columns to be ordered, then convert the 'event' to factor with levels specified as unique elements of 'event' first

df %>% 
  mutate(event = factor(event, levels = unique(event))) %>% 
  count(team, event) %>% 
  spread(event, n)
# A tibble: 3 × 4
#    team   OUT  WALK    HR
#*  <fctr> <int> <int> <int>
#1      A     4     2     1
#2      B     3     3     1
#3      C     1     3     3

Or with dcast from data.table

library(data.table)
dcast(setDT(df), team~event, length)

Or from table from base R

table(df)
akrun
  • 874,273
  • 37
  • 540
  • 662