0

I have a set of numeric x,y measurements from two conditions and four types. I hope to add to x,y scatterplots the densities (smoothed histograms) for x and y in each condition: http://stackoverflow.com/questions/31168944 . The link describes a solution (thanks!), but I want to ggplot:facet by the four types. ggplot can handle this for the x dimension using stat_density, but for the y, I think I must compute the densities for all 8 combinations of condition X type, store this in a data frame, and read the plotting data from there. If so, the issue becomes how to compute this data frame containing densities. For example:

set.seed(123)
dim1 <- c(rnorm(100, mean=1), rnorm(100, mean=4))
dim2 <- rnorm(200, mean=1)
condition <- factor(c(rep("a", 100), rep("b", 100)))
type <- factor(rep(c("I", "J", "K", "L"), 100))
mydf <- data.frame(dim1, dim2, condition, type)

To get dim2's densities for both levels of condition, I can do this:

ds <- do.call(rbind, lapply(unique(mydf$condition), function(lev) {
dens <- with(mydf, density(dim2[condition==lev]))
data.frame(x=dens$y, y=dens$x, condition=lev)
}))

and then add to the scatterplot using geom_path. But I want the densities not just from unique(mydf$cat), but from all 8 combinations of mydf$condition and mydf$type.

This could be a tapply or aggregate, but I can't figure out how to get the results into a data frame that ggplot will be able to interpret; it could be nesting lapply calls to run through the levels of $condition and $type (I tried and failed). I also attempted variants of

densities <- mydf %>% group_by(cat, type) %>% summarise(mydensity = density(dim2))

but summarise doesn't want a 512-length vector value.

I suspect there are several ways to do this - if anyone can set me on the right path I'll be grateful.

D Swingley
  • 137
  • 7
  • Not sure whether I understood the desired output.. You could try `mydf %>% group_by(condition, type) %>% do({ dens <- density(dim2); data_frame(x = dens$x, y = dens$y) })` – talat Jul 06 '15 at 19:31
  • @docendodiscimus - thanks. when I try this it produces a df with identical values for each level of condition X type. But in principle I think this format would be fine (or I can rearrange it if needed), i.e. a df with colnames condition, type, x, y. – D Swingley Jul 06 '15 at 20:16

0 Answers0