0

I have a list of statcast data, per day dating back to 2016. I am attempting to aggregate this data for finding the mean for each pitching ID.

I have the following code:

aggpitch <- aggregate(pitchingstat, by=list(pitchingstat$PitcherID),
                  FUN=mean, na.rm = TRUE)

This function aggregates every single column. I am looking to only aggregate a certain amount of columns.

How would I include only certain columns?

  • You want to specify a variable to aggregate - `aggregate(pitchingstat[c("var1","var2")], pitchingstat["PitcherID"], FUN=mean, na.rm=TRUE)` . Alternatively, use the formula interface `aggregate(cbind(var1,var2) ~ PitcherID, data=pitchingstat, FUN=mean, na.rm=TRUE)` . See this old answer - https://stackoverflow.com/a/9723314/496803 – thelatemail Nov 13 '18 at 01:30

3 Answers3

2

If you have more than one column that you'd like to summarize, you can use QAsena's approach and add summarise_at function like so:

pitchingstat %>%
group_by(PitcherID) %>%
summarise_at(vars(col1:coln), mean, na.rm = TRUE)

Check out link below for more examples: https://dplyr.tidyverse.org/reference/summarise_all.html

On_an_island
  • 387
  • 3
  • 16
0

Replace the first argument (pitchingstat) with the name of the column you want to aggregate (or a vector thereof)

12b345b6b78
  • 995
  • 5
  • 16
0

How about?:

library(tidyverse)
aggpitch <- pitchingstat %>% 
  group_by(PitcherID) %>% 
  summarise(pitcher_mean = mean(variable)) #replace 'variable' with your variable of interest here

or

library(tidyverse)
aggpitch <- pitchingstat %>%
  select(var_1, var_2)
  group_by(PitcherID) %>% 
  summarise(pitcher_mean = mean(var_1),
            pitcher_mean2 = mean(var_2))

I think this works but could use a dummy example of your data to play with.

QAsena
  • 603
  • 4
  • 9