2

I'm trying to make a shot chart in which the color gradient represents the average of success in each bin.

The next script gives the count of each bin, How can I change it to represent average of success in each bin instead the count? I attach the script output chart.

#rm(list=ls())
data3<-read.csv("data10.csv",header=T)

require(jpeg)
require(grid)
court<-rasterGrob(readJPEG("nba_court.jpg"),
                   width=unit(1,"npc"), height=unit(1,"npc"))

require(hexbin)
require(ggplot2)
ggplot(data3, aes(x=loc_x, y=loc_y)) + 
#  annotation_custom(court, -247, 253, -50, 418) +
  stat_binhex(bins = 18, colour = "gray", alpha = 0.8) +
  scale_fill_gradientn(colours = c("cyan","yellow","red")) +
  guides(alpha = FALSE, size = FALSE) +
  xlim(250, -250) +
  ylim(-52, 418) +
  geom_rug(alpha = 0.5) +
  coord_fixed() +
  ggtitle("Kobe Bryant shots") +
  theme(line = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        legend.title = element_blank(),
        plot.title = element_text(size = 17, lineheight = 1.2, face = "bold"))

enter image description here

DATASET SAMPLE:

data3 <- data.frame(matrix(data=c(-98,-75,-119,83,10,-103,-191,69,196,-21,-106,-127,-180,50,125,200,34,45,99,120,108,184,102,206,113,-3,93,94,164,101,82,146,108,24,56,77,67,200,250,-45,1,0,0,0,1,1,0,0,0,0,1,1,0,1,0,1,1,0,0,1),
                nrow=20,ncol=3))
colnames(data3)<-c("loc_x","loc_y","shot_made_flag")
Axeman
  • 32,068
  • 8
  • 81
  • 94
User 2014
  • 183
  • 6

1 Answers1

3

You should use stat_summary_hex and set fun=mean in order to calculate the effectiveness inside each bin:

# Create random data
set.seed(1)
data3 = data.frame(loc_x = runif(1000,-250,250), 
                   loc_y = rnorm(1000,230,50), 
                   shot_made_flag = rbinom(1000,1,.5))
require(hexbin)
require(ggplot2)

# The first two lines have changed (z = shot_made_flag and using fun = mean)
ggplot(data3, aes(x=loc_x, y=loc_y, z = shot_made_flag)) + 
  stat_summary_hex(fun = mean, bins = 18, colour = "gray", alpha = 0.8) +
  scale_fill_gradientn(colours = c("cyan","yellow","red")) +
  guides(alpha = FALSE, size = FALSE) +
  xlim(250, -250) +
  ylim(-52, 418) +
  geom_rug(alpha = 0.5) +
  coord_fixed() +
  ggtitle("Kobe Bryant shots") +
  theme(line = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        legend.title = element_blank(),
        plot.title = element_text(size = 17, lineheight = 1.2, face = "bold"))

Result: Mean inside each bin

Edited the full answer due to new data and to reflect the desired output (mean inside each hex cell)

R. Schifini
  • 9,085
  • 2
  • 26
  • 32
  • 1
    @Axeman thank you, now the answer reflects the new column that shows if the shot was made or not. – R. Schifini Mar 10 '20 at 22:18
  • Good answer, just what I was looking for, thanks. Do you know if there is any ggplot parameter that allows me to modify the size of each hex cell so that the cell size is directly proportional to the shot count? – User 2014 Mar 11 '20 at 07:03
  • 1
    @User2014 maybe [this](https://unconj.ca/blog/custom-hexbin-functions-with-ggplot.html) post helps – R. Schifini Mar 11 '20 at 11:29