2

Below is R code that attempts to create a heat map using geom_tile. Within my plot, I need to plot an outline of a box and a separate pair of x//y coordinates (titled platelocside and platelocheight in the code) that also have a fill (titled exitspeed in code) to complete the heat map. Here is the current structure of my data frame that I want to plot (labeled as "df" in the code).

structure(list(platelocheight = c(2.594, 3.803, 3.254, 3.599, 
3.617, 3.297, 2.093, 3.611, 2.842, 3.316, 2.872, 3.228, 3.633, 
4.28, 3.309, 2.8, 2.632, 3.754, 2.207, 3.604, 3.443, 2.188, 3.452, 
2.553, 3.382, 3.067, 2.986, 2.785, 2.567, 3.804), platelocside = c(0.059, 
-1.596, -0.65, -0.782, -0.301, -0.104, 0.057, -0.807, 0.003, 
1.661, 0.088, -0.32, -1.115, -0.146, -0.364, -0.952, 0.254, 0.109, 
-0.671, -0.803, -0.212, -0.069, -0.09, -0.472, 0.434, 0.337, 
0.723, 0.508, -0.197, -0.635), exitspeed = c(69.891, 73.352, 
83.942, 85.67, 79.454, 85.277, 81.078, 73.573, 77.272, 59.263, 
97.343, 91.436, 76.264, 83.479, 47.576, 84.13, 60.475, 61.093, 
84.54, 69.959, 88.729, 88.019, 82.18, 83.684, 86.296, 90.605, 
79.945, 59.899, 62.522, 77.75)), .Names = c("platelocheight", 
"platelocside", "exitspeed"), row.names = c(NA, 30L), class = "data.frame")
> 

When I run the code, I am able to get the outline of the box in my output, but the other data frame (title df) did not plot. Does anyone know who to use geom_tile that will be able to plot two separate dataframes? Thanks in advance!

library(RODBC)
library(ggplot2)


con=odbcConnect('ID',uid='username', pwd = 'password')

df=sqlQuery(con,"select platelocheight, platelocside, exitspeed from tm_sample where pitchcall='InPlay' 
and exitspeed is not null")

topKzone <- 3.5
botKzone <- 1.6
inKzone <- -0.95
outKzone <- 0.95
kZone <- data.frame(
  x=c(inKzone, inKzone, outKzone, outKzone, inKzone),
  y=c(botKzone, topKzone, topKzone, botKzone, botKzone)
)

ggplot(kZone, aes(x,y)) +
  geom_tile(data=df, aes(x=platelocside, y=platelocheight, fill= exitspeed)) +
  scale_fill_distiller(palette = "Spectral") +
  geom_path(lwd=1.5, col="black") +
  coord_fixed() 
Nate Walker
  • 217
  • 2
  • 12
  • Try adding `inherit.aes = FALSE` into your `geom_tile()` line for a start. But to get more specific help, you may want to provide the result of `dput(df)` (or at least `dput(header(df))` so we know what it's like. – Z.Lin Aug 03 '18 at 04:05
  • Thanks for the advice @Z.Lin! Unfortunately, the inherit.aes addition did not make a difference. I provided the result of the df in case you have any other ideas. Thanks again! – Nate Walker Aug 03 '18 at 04:38

1 Answers1

2

The problem is that not in the use of two dataframes, but rather in the implementation of geom_tile() itself.

If you swap for geom_tile for a different geom, say geom_point or geom_hex for example, you'd see that the plot is rendered perfectly:

library(ggplot2)
ggplot(kZone, aes(x,y)) +
  geom_hex(data=df, aes(x=platelocside, y=platelocheight, col=exitspeed)) +
  scale_fill_distiller(palette = "Spectral") +
  geom_path(lwd=1.5, col="black") +
  coord_fixed() 

Produces this: enter image description here

Understanding geom_tile

geom_tile is not a good choice with your data because you're using a continuous x and y scale and that makes it a more viable choice with something like a scatterplot as compared to a heatmap-like graphic.

You can see an example:

ggplot(mtcars, aes(x=as.factor(gear), y=as.factor(cyl), fill=hp))+
    geom_tile()

enter image description here

Compared to when you were to call it on two continuous variables:

ggplot(mtcars, aes(x=wt, y=mpg, fill=hp))+
    geom_tile()

You'll get tiles that are so small in effect it looks like nothing was plotted.

Going back to your question, the df that you're using has platelocside and platelocheight both as numeric, continuous variables. That makes geom_hex a less-than-ideal choice. If you do insist on using geom_tile then I would use one of the two solutions:

Solution 1

Use col instead of fill to get points instead of tiles (since the x and y are not factor variables)

library(ggplot2)
ggplot(kZone, aes(x,y)) +
  geom_tile(data=df, aes(x=platelocside, y=platelocheight, col=exitspeed), size=4) +
  scale_fill_distiller(palette = "Spectral") +
  geom_path(lwd=1.5, col="black") +
  coord_fixed() 

Solution 2

Make x and y a factor variable:

df$h <- round(df$platelocheight)
df$s <- round(df$platelocside)

ggplot(kZone, aes(x,y)) +
  geom_tile(data=df, aes(x=s, y=h, fill=exitspeed)) +
  scale_fill_distiller(palette = "Spectral") +
  geom_path(lwd=1.5, col="black") +
  coord_fixed() 

enter image description here

onlyphantom
  • 8,606
  • 4
  • 44
  • 58
  • That's great advice, thanks @onlyphantom! I appreciate you taking the time to write and explain all of that! Quick follow-up question. For this type of data in the baseball industry, heat maps are very popular when portraying this type of information. Would you have any recommendations on geom packages if I wanted to create a chart like what shows up in this link? http://baseballanalysts.com/rhb_rhp_copy.png – Nate Walker Aug 03 '18 at 06:02
  • 1
    No problem Nate, always happy to help. I took a look at the png and off the top of my head I would recommend you take a closer look at density charts offered in ggplot. Do a search on `stat_density_2d()` and that may just be what you need. Good luck! – onlyphantom Aug 03 '18 at 06:07
  • Awesome! Thanks so much for the help! – Nate Walker Aug 03 '18 at 06:11
  • Hey onlyphantom! I took your advice and have attempted to build a `stat_density_2d()` plot, but I'm having an issue with the fill variable. Would you mind taking a look at my most recent post if you have some time? https://stackoverflow.com/questions/51865464/adding-a-3rd-variable-to-a-stat-density-2d-plot Thanks so much! – Nate Walker Aug 15 '18 at 19:53