Mixture of poisson distribution using flexmix in R

Question

I am trying to fit mixture of 3 poisson distribution using flexmix package in R as per the code below:

require(flexmix)

freq<- c(222950,111682,72429,48126,34515,25801,19199,15033,11859, 9226, 
7363, 5910, 4659, 3723, 2985, 2291,1907, 1447, 
1265,891,722,620,546,439,359,286,255,236,208,148,176,145,
151,135,102, 92,136, 99,102, 92, 81, 85, 71, 84, 58, 78, 59, 66 , 47, 48, 
42, 58, 43, 38, 34, 45, 21, 28, 32, 36, 27, 22, 26, 31 ,
20, 16, 12, 19, 19, 15, 18, 17,8,8, 12, 18, 10,6,5,8,9,4,7,5,8, 
10,6,3,7,2,4,3,4,6,6,5,
3,3,6,4,4,4,2,2,3,1,2,1,3,4,2,3)

zz<- as.data.frame(rep(0:111,times=freq))

colnames(zz)<- "xx"

flexfit1<- flexmix(xx ~ 1, data = zz, k = 3, model = FLXglm(family = 
"poisson"))

summary( flexfit1)

I get the following result:

Call:
flexmix(formula = xx ~ 1, data = zz, k = 3, model = FLXglm(family = 
"poisson"))

        prior   size post>0 ratio
 Comp.1 0.796 489702 561594 0.872
 Comp.2 0.204 120115 386867 0.310

 'log Lik.' -1465654 (df=3)
 AIC: 2931315   BIC: 2931349

I mentioned k=3 in the flexmix call. Why is there only two components in the summary? also, why is df of log likelihood only 3?

I tried to replicate the above with some random data as follows:

zz<- as.data.frame(sample(0:111,10^4,replace=T))
colnames(zz)<- "xx"

flexfit1<- flexmix(xx ~ 1, data = zz, k = 3, model = FLXglm(family = 
"poisson"))

summary( flexfit1)

I get the following output:

prior size post>0 ratio
Comp.1 0.458 4614   5737 0.804
Comp.2 0.332 3259   5117 0.637
Comp.3 0.211 2127   2759 0.771

'log Lik.' -53488.65 (df=5)
AIC: 106987.3   BIC: 107023.4

Here, I can see the 3 components and also degrees of freedom is 5. Then, why am I not getting similar result in 1st case?

In the first case, I compared the mean and variance from of the fitted model to that of data. The following code gives the model parameters -

refit1 <- refit(flexfit1)

summary(refit1)

This gives the following output -

$Comp.1
            Estimate Std. Error z value  Pr(>|z|)    
(Intercept) 0.0962269  0.0019745  48.735 < 2.2e-16 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 $Comp.2
              Estimate Std. Error z value  Pr(>|z|)    
 (Intercept) 2.2364497  0.0013964  1601.5 < 2.2e-16 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now we can calculate the - mean from fitted model =p1*m1+p2*m2= 0.796*exp(0.0962269)+0.204*exp(2.2364497) = 2.785851 which is very close to mean of data (i.e. mean(zz[,1])) which is 2.778745.

variance from fitted model = p1*m1+p2*m2+p1*p2*(m1-m2)^2=0.796*exp(0.0962269)+0.204*exp(2.2364497)+0.796*0.204*(exp(0.0962269)-exp(2.2364497))^2 =13.86233 which is much lesser than variance of data (i.e. var(zz[,1]) which is 23.4328.

So intutively, it appears that to explain the variance we should go for mixture of more than two poisson distribution.

Please let me know your thoughts

It might be that you do not have more independent components to fit in your data. — Heikki, Nov 09 '17 at 10:03
look at the "prior" in first case, `0.796 + 0.204` which is `1`. So, the remaining component has zero prior / mixing proportions. — kangaroo_cliff, Nov 09 '17 at 10:14
I have added the comparison of mean and variance of the data to that of the fitted model. Please let me know your thoughts. In this case, shouldn't the component 3 appear with 0 prior and 0 size? Also, even if the prior is 0, should the df be 3 because whatever the solution is, my model has 5 free parameters? — tamal kuila, Nov 09 '17 at 14:13

Mixture of poisson distribution using flexmix in R

0 Answers0