4

I would like to plot multiple groups in a stat_density2 plot with alpha values related to the counts of observations in each group. However, the levels formed by stat_density2d seem to be normalized to the number of observations in each group. For example,

temp <- rbind(movies[1:2,],movies[movies$mpaa == "R" | movies$mpaa == "PG-13",])
ggplot(temp, aes(x=rating,y=length)) + 
stat_density2d(geom="tile", aes(fill = mpaa, alpha=..density..), contour=FALSE) + 
theme_minimal()

Creates a plot like this:

enter image description here

Because I only included 2 points without ratings, they result in densities that look much tighter/stronger than the other two, and so wash out the other two densities. I've tried looking at Overlay two ggplot2 stat_density2d plots with alpha channels and Specifying the scale for the density in ggplot2's stat_density2d but they don't really address this specific issue.

Ultimately, what I'm trying to accomplish with my real data, is I have "power" samples from discrete 2d locations for multiple conditions, and I am trying to plot what their relative powers/spatial distributions are. I am duplicating points in locations relative to their powers, but this has resulted in low power conditions with just a few locations looking the strongest when using stat_density2d. Please let me know if there is a better way of going about doing this!

Thanks!

Community
  • 1
  • 1
Guy
  • 61
  • 5

2 Answers2

2

stat_hexbin, which understands ..count.. in addition to ..density.., may work for you:

ggplot(temp, aes(x=rating,y=length)) + 
    stat_binhex(geom="hex", aes(fill = mpaa, alpha=..count..)) + 
    theme_minimal()

Although you may want to adjust the bin width.

jaimedash
  • 2,683
  • 17
  • 30
  • Thanks! That should definitely work better in some situations. Unfortunately, I need something with more smooth interpolation between datapoints, like what's afforded through stat_density2d. – Guy Mar 16 '15 at 13:59
  • Understood. Unfortunately, this seems impossible currently. You may be able to hack it by inserting eg `df$count <- sapply(split(df, df$group), length)` at line 78 and making an appropriate change to line 82 in the [stat-density-2d.r](https://github.com/hadley/ggplot2/blob/4bb9270ef4d5d5062353438fd99d17b6f6de98a2/R/stat-density-2d.r) from ggplot2 source – jaimedash Mar 16 '15 at 18:48
  • Oops, that should be `df$count <- sapply(split(df, df$group), nrow)`, but regardless this suggestion doesn't seem to work (perhaps not surprisingly). – jaimedash Mar 16 '15 at 19:03
2

Not the most elegant r code, but this seems to work. I normalize my real data a bit differently than this, but this gets the solution I found across. I use a for loop where I find the average power for the condition and add a new stat_density2d layer with the alpha scaled by that average power.

temp <- rbind(movies[1:2,],movies[movies$mpaa == "R" | movies$mpaa == "PG-13",])
mpaa = unique(temp$mpaa)
p <- ggplot() + theme_minimal()
for (ii in seq(1,3)) {
  ratio = length(which(temp$mpaa == mpaa[ii]))
  p <- p + stat_density2d(data=temp[temp$mpaa == mpaa[ii],], 
                          aes(x=rating,y=length,fill = mpaa, alpha=..level..),
                      geom="polygon", 
                      contour=TRUE, 
                      alpha = ratio/20, 
                      lineType = "none") 
}
print(p)
Guy
  • 61
  • 5