Split violin plot with points on top to indicate info

Question

This is a followup post from here and here

I have successfully implemented the split violin ggplot2 for my data (two median estimator densities, for two cases) that need to be compared. Now, since i would like to add some confidence interval. I m following the code posted in the links above:

EDIT: A reproducible example

tmp <- rnorm(1000,0,1)
tmp.2 <- rnorm(1000,0,1)
x.1 <- density(tmp)
y.1 <- density(tmp.2)

Here, i m making the densities, extracting the (x,y) pairs. Then i m getting the quantiles back,

# Make densities
densities <- as.data.frame(c(x.1$x,y.1$x))
colnames(densities) <- "loc"
densities$dens <- c(x.1$y,y.1$y)
densities$drop_case <- c(rep("B",512),rep("S",512))
densities$dens <- ifelse(densities$drop_case=="B",densities$dens*-1,densities$dens)
densities$dens <- ifelse(densities$drop_case=="S",densities$dens*1,densities$dens)

conf <- as.data.frame(c(quantile(tmp,c(0.025,0.975))[1],quantile(tmp,c(0.025,0.975))[2],quantile(tmp.2,c(0.025,0.975))[1],quantile(tmp.2,c(0.025,0.975))[2]))
colnames(conf) <- "intervals"
conf$drop_case <- c(rep("B",2),rep("S",2))
conf$length <- rep(1000,4)

Now here i am trying to extract the values inside the densities, as was noted in the linked posts

Find data points in densities

val.tmp <- rep(0,4)
val.tmp.2 <- rep(0,4)
for (i in 1:4) {
x.here <- densities$loc
y.here <- densities$dens
your.number<- conf$intervals[i]
pos.tmp <- which(abs(x.here-your.number)==min(abs(x.here-your.number)))
val.tmp[i] <- x.here[pos.tmp]
val.tmp.2[i] <- y.here[pos.tmp]
}
conf$positions <- val.tmp
conf$length <- val.tmp.2

conf$length <- ifelse(conf$drop_case=="B",conf$length*-1,conf$length)
conf$length <- ifelse(conf$drop_case=="S",conf$length*1,conf$length)

ggplot(densities,aes(dens, loc, fill = factor(drop_case)))+
geom_polygon()+
scale_x_continuous(breaks = 0, name = info$Name)+
ylab('Estimator Density') +
theme(axis.title.x = element_blank())+
geom_point(data = conf, aes(x = positions, y = length, fill = factor(drop_case), group = factor(drop_case))
         ,shape = 21, colour = "black", show.legend = FALSE)

Then unfortuantely I am facing the following, the points are not mapped on the densities but are rather mapped on the plane.

The aesthetics that you _map_ to a variable should be within `aes()` (i.e. `x`, `y`, `fill` and `group`, but those that you _set_ to a value should outside the `aes()`, i.e. `shape` and `colour`. — Axeman, Oct 08 '16 at 15:20
Ok, i changed that and also fixed the `breaks = 1` argument. Now, i m not getting the split violin at all. — Trelokoritso, Oct 08 '16 at 15:33
Your code isn't reproducible, since we don't have `block.1`. Have a look at: [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) — Axeman, Oct 08 '16 at 16:09
Sorry, but it is not clear to me what those points are supposed to convey. Why aren't their y values mapped to `intervals`, and what kind of x value should they have? — Axeman, Oct 10 '16 at 09:40
These points are the 95 confidence intervals of the 2 distributions, so i would prefer to map them on the distributions. not on the general plot. — Trelokoritso, Oct 10 '16 at 09:46

score 0 · Accepted Answer · answered Oct 10 '16 at 10:12

There is a bunch of little mistakes in the code. Firstly, within that for loop, you can't set x.here and y.here to all of the density and location values, since that includes both groups. Secondly, since the signs are already changed in densities there is no need to use those ifelse statements afterwards. Thirdly, you would only need the top ifelse anyway, since the bottom one does absolutely nothing. Finally, you had the x and y mappings in geom_point the wrong way around!

There is a bunch of other things one could change to make the code more understandable and pretty, but I'm on limited time, so I'll leave those for what they are.

Below the full adjusted code:

tmp <- rnorm(1000,0,1)
tmp.2 <- rnorm(1000,0,1)
x.1 <- density(tmp)
y.1 <- density(tmp.2)  

# Make densities
densities <- as.data.frame(c(x.1$x,y.1$x))
colnames(densities) <- "loc"
densities$dens <- c(x.1$y,y.1$y)
densities$drop_case <- c(rep("B",512),rep("S",512))
densities$dens <- ifelse(densities$drop_case=="B",densities$dens*-1,densities$dens)

conf <- as.data.frame(c(quantile(tmp,c(0.025,0.975)), quantile(tmp.2,c(0.025,0.975))))
colnames(conf) <- "intervals"
conf$drop_case <- c(rep("B",2),rep("S",2))
conf$length <- rep(1000,4)

val.tmp <- rep(0,4)
val.tmp.2 <- rep(0,4)
for (i in 1:4) {
  x.here <- densities$loc[densities$drop_case == conf$drop_case[i]]
  y.here <- densities$dens[densities$drop_case == conf$drop_case[i]]
  your.number<- conf$intervals[i]
  pos.tmp <- which(abs(x.here-your.number)==min(abs(x.here-your.number)))
  val.tmp[i] <- x.here[pos.tmp]
  val.tmp.2[i] <- y.here[pos.tmp]
}
conf$positions <- val.tmp
conf$length <- val.tmp.2

ggplot(densities, aes(dens, loc, fill = drop_case)) +
  geom_polygon()+
  ylab('Estimator Density') +
  theme(axis.title.x = element_blank())+
  geom_point(data = conf, aes(x = length, y = positions, fill = drop_case),
             shape = 21, colour = "black", show.legend = FALSE)

This results in:

I would personally prefer a plot with line segments:

ggplot(densities, aes(dens, loc, fill = factor(drop_case)))+
  geom_polygon()+
  ylab('Estimator Density') +
  theme(axis.title.x = element_blank())+
  geom_segment(data = conf, aes(x = length, xend = 0, y = positions, yend = positions))

Great! Thank you so much for the clear and concise solution! — Trelokoritso, Oct 13 '16 at 13:25

Split violin plot with points on top to indicate info

EDIT: A reproducible example

Find data points in densities

1 Answers1