I have a set of numeric x,y measurements from two conditions and four types. I hope to add to x,y scatterplots the densities (smoothed histograms) for x and y in each condition: http://stackoverflow.com/questions/31168944 . The link describes a solution (thanks!), but I want to ggplot:facet by the four types. ggplot can handle this for the x dimension using stat_density, but for the y, I think I must compute the densities for all 8 combinations of condition X type, store this in a data frame, and read the plotting data from there. If so, the issue becomes how to compute this data frame containing densities. For example:
set.seed(123)
dim1 <- c(rnorm(100, mean=1), rnorm(100, mean=4))
dim2 <- rnorm(200, mean=1)
condition <- factor(c(rep("a", 100), rep("b", 100)))
type <- factor(rep(c("I", "J", "K", "L"), 100))
mydf <- data.frame(dim1, dim2, condition, type)
To get dim2's densities for both levels of condition, I can do this:
ds <- do.call(rbind, lapply(unique(mydf$condition), function(lev) {
dens <- with(mydf, density(dim2[condition==lev]))
data.frame(x=dens$y, y=dens$x, condition=lev)
}))
and then add to the scatterplot using geom_path. But I want the densities not just from unique(mydf$cat), but from all 8 combinations of mydf$condition and mydf$type.
This could be a tapply or aggregate, but I can't figure out how to get the results into a data frame that ggplot will be able to interpret; it could be nesting lapply calls to run through the levels of $condition and $type (I tried and failed). I also attempted variants of
densities <- mydf %>% group_by(cat, type) %>% summarise(mydensity = density(dim2))
but summarise doesn't want a 512-length vector value.
I suspect there are several ways to do this - if anyone can set me on the right path I'll be grateful.