Calculate average points in each bin of a shot chart with R

Question

I'm trying to make a shot chart in which the color gradient represents the average of success in each bin.

The next script gives the count of each bin, How can I change it to represent average of success in each bin instead the count? I attach the script output chart.

#rm(list=ls())
data3<-read.csv("data10.csv",header=T)

require(jpeg)
require(grid)
court<-rasterGrob(readJPEG("nba_court.jpg"),
                   width=unit(1,"npc"), height=unit(1,"npc"))

require(hexbin)
require(ggplot2)
ggplot(data3, aes(x=loc_x, y=loc_y)) + 
#  annotation_custom(court, -247, 253, -50, 418) +
  stat_binhex(bins = 18, colour = "gray", alpha = 0.8) +
  scale_fill_gradientn(colours = c("cyan","yellow","red")) +
  guides(alpha = FALSE, size = FALSE) +
  xlim(250, -250) +
  ylim(-52, 418) +
  geom_rug(alpha = 0.5) +
  coord_fixed() +
  ggtitle("Kobe Bryant shots") +
  theme(line = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        legend.title = element_blank(),
        plot.title = element_text(size = 17, lineheight = 1.2, face = "bold"))

DATASET SAMPLE:

data3 <- data.frame(matrix(data=c(-98,-75,-119,83,10,-103,-191,69,196,-21,-106,-127,-180,50,125,200,34,45,99,120,108,184,102,206,113,-3,93,94,164,101,82,146,108,24,56,77,67,200,250,-45,1,0,0,0,1,1,0,0,0,0,1,1,0,1,0,1,1,0,0,1),
                nrow=20,ncol=3))
colnames(data3)<-c("loc_x","loc_y","shot_made_flag")

Please share a portion of your data that can be copy-pasted for someone trying to help you — astrofunkswag, Mar 10 '20 at 21:09
To get help on this site, you wanna share a small portion of data that somebody can just copy and paste. Take a look at `dput(head(data3))`. See [this link](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more info — astrofunkswag, Mar 10 '20 at 21:14

R. Schifini · Accepted Answer · 2020-03-10T22:16:58.873

You should use stat_summary_hex and set fun=mean in order to calculate the effectiveness inside each bin:

# Create random data
set.seed(1)
data3 = data.frame(loc_x = runif(1000,-250,250), 
                   loc_y = rnorm(1000,230,50), 
                   shot_made_flag = rbinom(1000,1,.5))
require(hexbin)
require(ggplot2)

# The first two lines have changed (z = shot_made_flag and using fun = mean)
ggplot(data3, aes(x=loc_x, y=loc_y, z = shot_made_flag)) + 
  stat_summary_hex(fun = mean, bins = 18, colour = "gray", alpha = 0.8) +
  scale_fill_gradientn(colours = c("cyan","yellow","red")) +
  guides(alpha = FALSE, size = FALSE) +
  xlim(250, -250) +
  ylim(-52, 418) +
  geom_rug(alpha = 0.5) +
  coord_fixed() +
  ggtitle("Kobe Bryant shots") +
  theme(line = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        legend.title = element_blank(),
        plot.title = element_text(size = 17, lineheight = 1.2, face = "bold"))

Result:

Edited the full answer due to new data and to reflect the desired output (mean inside each hex cell)

@Axeman thank you, now the answer reflects the new column that shows if the shot was made or not. — R. Schifini, Mar 10 '20 at 22:18
Good answer, just what I was looking for, thanks. Do you know if there is any ggplot parameter that allows me to modify the size of each hex cell so that the cell size is directly proportional to the shot count? — User 2014, Mar 11 '20 at 07:03
@User2014 maybe [this](https://unconj.ca/blog/custom-hexbin-functions-with-ggplot.html) post helps — R. Schifini, Mar 11 '20 at 11:29

Calculate average points in each bin of a shot chart with R

1 Answers1