R : function to generate a mixture distribution

Question

I need to generate samples from a mixed distribution

40% samples come from Gaussian(mean=2,sd=8)
20% samples come from Cauchy(location=25,scale=2)
40% samples come from Gaussian(mean = 10, sd=6)

To do this, i wrote the following function :

dmix <- function(x){
prob <- (0.4 * dnorm(x,mean=2,sd=8)) + (0.2 * dcauchy(x,location=25,scale=2)) + (0.4 * dnorm(x,mean=10,sd=6))
return (prob)
}

And then tested with:

foo = seq(-5,5,by = 0.01)
vector = NULL
for (i in 1:1000){
vector[i] <- dmix(foo[i])
}
hist(vector)

I'm getting a histogram like this (which I know is wrong) - histogram of the supposed distribution

What am I doing wrong? Can anyone give some pointers please?

just one thing i noticed: you can create the same plot using `hist(dmix(seq(-5,5,by = 0.01)))`without looping and `vector` — talat, May 05 '14 at 20:01
can you make a histogram of the random samples? `dmix <- function(x = 100) {prob <- c(rnorm(x * .4, mean = 2, sd = 8), rcauchy(x * .2, location = 25, scale = 2), rnorm(x * .4, mean = 10, sd = 6)); hist(prob)}; dmix()` — rawr, May 05 '14 at 20:02
@rawr I could using this - dmix2 <- function(x){ prob <- c(rnorm(x * .4, mean = 2, sd = 8),rcauchy(x * .2, location = 25, scale = 2),rnorm(x * .4, mean = 10, sd = 6)) return (prob) } hist(prob) But I need to get the probability of the sample belonging to the distribution. dnorm, dcauchy etc give the probability. I suppose I should use the rnorm and rcauchy then. [Can't add the histogram plot here] — Raaj, May 05 '14 at 20:14

score 14 · Accepted Answer · edited May 23 '17 at 12:24

14

There are of course other ways to do this, but the distr package makes it pretty darned simple. (See also this answer for another example and some more details about distr and friends).

library(distr)

## Construct the distribution object.
myMix <- UnivarMixingDistribution(Norm(mean=2, sd=8), 
                                  Cauchy(location=25, scale=2),
                                  Norm(mean=10, sd=6),
                                  mixCoeff=c(0.4, 0.2, 0.4))
## ... and then a function for sampling random variates from it
rmyMix <- r(myMix)

## Sample a million random variates, and plot (part of) their histogram
x <- rmyMix(1e6)
hist(x[x>-100 & x<100], breaks=100, col="grey", main="")

enter image description here

And if you'd just like a direct look at your mixture distribution's pdf, do:

plot(myMix, to.draw.arg="d")

enter image description here

edited May 23 '17 at 12:24

Community

1
1

answered May 05 '14 at 20:13

Josh O'Brien

159,210
26
366
455

This seems to be what I'm looking for. – Raaj May 05 '14 at 20:19
Oh this is exactly what I'm looking for! Many thanks! Didn't know about the existence of "distr". Did run into "mixtools". However that seems to be more for analysis rather than generation. – Raaj May 05 '14 at 20:21
@Raaj Great. Glad that helped. As I'm sure you just saw, this just scratches the surface of what Ruckdeschel et al. have put together! – Josh O'Brien May 05 '14 at 20:23
Might I bug you with one more question? It seems to me like [-100 to 100] is similar to truncation (or is it the same?) What is its purpose? I'll look into the documentation of course but do let me know if you could. Thanks. – Raaj May 05 '14 at 20:27
3

when you draw Cauchy random variables, if you don't cut off the tails this way then you end up with a histogram with a ridiculously large x-range (e.g. -1e-7 to +1e7), and then you can't see the body of the distribution very well at all. – Ben Bolker May 05 '14 at 20:29
Ah thank you. That escaped me for a moment. Will play with the parameters and observe again to imprint. – Raaj May 05 '14 at 20:34

score 2 · Answer 2 · answered Oct 27 '16 at 09:18

Always use R vectorization, if you can. Even if many values are actually discarded, it's often more efficient. (at least faster than the previous solution, and avoids extra - libraries)

rmy_ve = function(n){

##generation of (n x 3) matrix. 
##Each column is a random sample of size n from a single component of the mixture
temp = cbind(rnorm(n,2,8),rcauchy(n,25,2),rnorm(n,10,6))

##random generation of the indices
id = sample(1:3,n,rep = T,prob = c(.4,.2,.4))  
id = cbind(1:n,id)
temp[id]
}


> microbenchmark(rmy_ve(1e6),rmyMix(1e6))
Unit: milliseconds
       expr     min       lq     mean   median       uq      max    neval
rmy_ve(1e+06) 241.904 248.7528 272.9119 260.8752 298.1126 322.7429   100
rmyMix(1e+06) 270.917 322.3627 341.4779 329.1706 364.3947 561.2608   100

R : function to generate a mixture distribution

2 Answers2

Linked

Related