ggplot fitted distributions aren't quite level with the x axis

Question

I am trying to model output data from the "mixtools" package in a similar manner to how jhoward did here with minor changes. The examples provided work as expected and his solution works with my data, however the fitted distributions seem...tilted.

Here is a link to my output graph.

Notice how both distributions are angled above the x axis slightly.

Using an artificial dataset doesn't show the problem anymore.

`sample.ratio`=c(rnorm(120000,0.386,0.0842),rnorm(200000,0.653,0.1153))
`mixmdl_k2.sample`=normalmixEM(`sample.ratio`, k=2)

ggplot_mixEM <- function(EM) {
  require(ggplot2)
  x       <- with(EM,seq(min(x),max(x),len=1000))
  pars    <- with(EM,data.frame(comp=colnames(posterior), mu, sigma,lambda))
  em.df   <- data.frame(x=rep(x,each=nrow(pars)),pars)
  em.df$y <- with(em.df,lambda*dnorm(x,mean=mu,sd=sigma))
  ggplot(data.frame(x=EM$x),aes(x,y=..density..)) + 
    geom_histogram(fill=NA,color="black")+
    geom_polygon(data=em.df,aes(x,y,fill=comp),color="grey50", alpha=0.5)+
    scale_fill_discrete("Component\nModes",labels=format(em.df$mu,digits=3))+
    geom_density(color="red",linetype="dotted")
}
ggplot_mixEM(`mixmdl_k2.sample`)

The distribution plots are new level . The distribution fills are now level, though they are also no longer constrained to the (0,1) interval.

About my data: it ranges explicitly from 0-1, and the bimodal distribution modeled by mixtools is expected. Increasing the x scale with scale_x_continuous(limits=c(-0.1,1.1)) didn't solve the issue. The first solution from the link above, provided by Spacedman, also worked on my data, but gave the same "tilted plot" results.

Does anyone know why this is happening and how to fix this tilt? Is there a way to force extend the shading to the x axis?

Thank you.

Edit: added in sample code.

You should include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) here in the question with sample input data and the code you used to make the plot. — MrFlick, Mar 29 '17 at 21:15
I added in a sample of my code, but the problem doesn't show up anymore with an artificial dataset. — Michael, Mar 29 '17 at 21:30
Well it would be useful to determine how your data is different from simulated data. Right now you seem to be calculating the polygon only over the range of `x` but the distribution may not be 0 at the extremes. Probably best to extent the range over which you calculate values. Also, where did you pick up the odd habit of quoting variable names in backticks? — MrFlick, Mar 29 '17 at 21:33
The backticking is due to the fact that most of my data has numbers in their names (as you see in a second), and RStudio doesn't like it when a number is the first character in a variable. I've been trying to not start variables with numbers recently, but I digress... I'm not sure how the sample and real datasets could be different. ``> head(`sample.ratio`) [1] 0.4416384 0.4001051 0.5458209 0.2530325 0.4094072 0.3403063 > head(`20-03_UG.Ratio`) [1] 0.6800000 0.2826087 0.6153846 0.3333333 0.4137931 0.3333333`` What step should be modified to extend the calculation range? — Michael, Mar 29 '17 at 21:43
Turns out it was the `x <- with(EM,seq(min(x),max(x),len=1000))` line. I changed it to max(1.1) and it worked out. — Michael, Mar 29 '17 at 21:50

score 0 · Answer 1 · answered Mar 29 '17 at 21:53

0

Changing the line

x <- with(EM,seq(min(x),max(x),len=1000))

to

x       <- with(EM,seq(min(x),max(1.1),len=1000))

Seems to have fixed it.

answered Mar 29 '17 at 21:53

Michael

35
1
6

ggplot fitted distributions aren't quite level with the x axis

1 Answers1