1

I am trying to create a graph where because there are so many points on the graph, at the edges of the green it starts to fade to black while the center stays green. The code I am currently using to create this graph is:

plot(snb$px,snb$pz,col=snb$event_type,xlim=c(-2,2),ylim=c(1,6))

I looked into contour plotting but that did not work for this. The coloring variable is a factor variable.

Thanks!

graph

Community
  • 1
  • 1
BaseballR
  • 147
  • 2
  • 12
  • 3
    So what do you want the result to look like if not what you already have? Also, could you provide some simulated example data? – thelatemail Jul 18 '13 at 23:11
  • 1
    You should bin your points to get less points. looks at `hexbin` even it is not evident to aggregate the z dimensions ( the event_type). – agstudy Jul 18 '13 at 23:41
  • You can sometimes improve the appearance by shrink the size of the "points" with cex=0.1. Other times is is necessary to use indexing into a vector of transparent colors. Answer @thelatemail's question, please. – IRTFM Jul 19 '13 at 00:11
  • Sorry for taking so long to get back to you guys. I left work right after posting this problem. These points are called strikes versus called balls in MLB games based on where they crossed the plate. My ideal is to have some sort of contour type plot [link](http://www.originlab.com/www/helponline/Origin/en/images/Creating_Contour_Graphs/A_Note_about_Contour_Graphs-09.png) where the middle can be interpreted as 100% strike % and as it goes out it gets smaller and smaller percentage of pitches in that zone are called strikes. Does that make sense at all? @thelatemail – BaseballR Jul 19 '13 at 15:12
  • Not really a way to create this kind of data so here is the first 100 rows of data in a csv at this download link: [link](http://temp-share.com/show/dPf3UC9tW) – BaseballR Jul 19 '13 at 16:46
  • you might want to compute kernel density estimates (see `kde2d` in the MASS package) for strikes and for the total data set, then take the ratio of strikes to totals (`kde2d` will return a matrix of estimated densities) and plot contours of the ratio ... – Ben Bolker Jul 20 '13 at 15:46
  • the http://cran.r-project.org/web/packages/pitchRx/ pitchrx package ( http://cpsievert.wordpress.com/2013/01/13/pitchrx-shiny-fun-flexible-mlb-pitchfx-visualization/ ) seems to do a lot of this stuff. – Ben Bolker Jul 20 '13 at 16:15

2 Answers2

3

This is a great problem for ggplot2.

First, read the data in:

snb <- read.csv('MLB.csv')

With your data frame you could try plotting points that are partly transparent, and setting them to be colored according to the factor event_type:

require(ggplot2)
p1 <- ggplot(data = snb, aes(x = px, y = py, color = event_type)) + 
      geom_point(alpha = 0.5)
print(p1)

and then you get this:

enter image description here

Or, you might want to think about plotting this as a heatmap using geom_bin2d(), and plotting facets (subplots) for each different event_type, like this:

p2 <- ggplot(data = snb, aes(x = px, y = py)) + 
  geom_bin2d(binwidth = c(0.25, 0.25)) + 
  facet_wrap(~ event_type)
print(p2)

which makes a plot for each level of the factor, where the color will be the number of data points in each bins that are 0.25 on each side. But, if you have more than about 5 or 6 levels, this might look pretty bad. From the small data sample you supplied, I got this

enter image description here

If the levels of the factors don't matter, there are some nice examples here of plots with too many points. You could also try looking at some of the examples on the ggplot website or the R cookbook.

Community
  • 1
  • 1
Andy Clifton
  • 4,926
  • 3
  • 35
  • 47
3

Transparency could help, which is easily achieved, as @BenBolker points out, with adjustcolor:

colvect = adjustcolor(c("black", "green"), alpha = 0.2)
plot(snb$px, snb$pz,
     col = colvec[snb$event_type],
     xlim = c(-2,2),
     ylim = c(1,6))

It's built in to ggplot:

require(ggplot2)
p <- ggplot(data = snb, aes(x = px, y = pz, color = event_type)) +
    geom_point(alpha = 0.2)
print(p)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • 1
    you don't even need the *scales* package; `colvec=adjustcolor(c("red","green","blue"),alpha=0.5); ... col=colvec[snb$event_type] ...` – Ben Bolker Jul 19 '13 at 02:35