1

I am trying to model output data from the "mixtools" package in a similar manner to how jhoward did here with minor changes. The examples provided work as expected and his solution works with my data, however the fitted distributions seem...tilted.

Here is a link to my output graph.

Notice how both distributions are angled above the x axis slightly.

Using an artificial dataset doesn't show the problem anymore.

`sample.ratio`=c(rnorm(120000,0.386,0.0842),rnorm(200000,0.653,0.1153))
`mixmdl_k2.sample`=normalmixEM(`sample.ratio`, k=2)

ggplot_mixEM <- function(EM) {
  require(ggplot2)
  x       <- with(EM,seq(min(x),max(x),len=1000))
  pars    <- with(EM,data.frame(comp=colnames(posterior), mu, sigma,lambda))
  em.df   <- data.frame(x=rep(x,each=nrow(pars)),pars)
  em.df$y <- with(em.df,lambda*dnorm(x,mean=mu,sd=sigma))
  ggplot(data.frame(x=EM$x),aes(x,y=..density..)) + 
    geom_histogram(fill=NA,color="black")+
    geom_polygon(data=em.df,aes(x,y,fill=comp),color="grey50", alpha=0.5)+
    scale_fill_discrete("Component\nModes",labels=format(em.df$mu,digits=3))+
    geom_density(color="red",linetype="dotted")
}
ggplot_mixEM(`mixmdl_k2.sample`)

The distribution plots are new level. The distribution fills are now level, though they are also no longer constrained to the (0,1) interval.

About my data: it ranges explicitly from 0-1, and the bimodal distribution modeled by mixtools is expected. Increasing the x scale with scale_x_continuous(limits=c(-0.1,1.1)) didn't solve the issue. The first solution from the link above, provided by Spacedman, also worked on my data, but gave the same "tilted plot" results.

Does anyone know why this is happening and how to fix this tilt? Is there a way to force extend the shading to the x axis?

Thank you.

Edit: added in sample code.

Community
  • 1
  • 1
Michael
  • 35
  • 1
  • 6
  • You should include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) here in the question with sample input data and the code you used to make the plot. – MrFlick Mar 29 '17 at 21:15
  • I added in a sample of my code, but the problem doesn't show up anymore with an artificial dataset. – Michael Mar 29 '17 at 21:30
  • Well it would be useful to determine how your data is different from simulated data. Right now you seem to be calculating the polygon only over the range of `x` but the distribution may not be 0 at the extremes. Probably best to extent the range over which you calculate values. Also, where did you pick up the odd habit of quoting variable names in backticks? – MrFlick Mar 29 '17 at 21:33
  • The backticking is due to the fact that most of my data has numbers in their names (as you see in a second), and RStudio doesn't like it when a number is the first character in a variable. I've been trying to not start variables with numbers recently, but I digress... I'm not sure how the sample and real datasets could be different. ``> head(`sample.ratio`) [1] 0.4416384 0.4001051 0.5458209 0.2530325 0.4094072 0.3403063 > head(`20-03_UG.Ratio`) [1] 0.6800000 0.2826087 0.6153846 0.3333333 0.4137931 0.3333333`` What step should be modified to extend the calculation range? – Michael Mar 29 '17 at 21:43
  • Turns out it was the `x <- with(EM,seq(min(x),max(x),len=1000))` line. I changed it to max(1.1) and it worked out. – Michael Mar 29 '17 at 21:50

1 Answers1

0

Changing the line

x <- with(EM,seq(min(x),max(x),len=1000))

to

x       <- with(EM,seq(min(x),max(1.1),len=1000))

Seems to have fixed it.

Michael
  • 35
  • 1
  • 6