4

I am trying to fit mixture of 3 poisson distribution using flexmix package in R as per the code below:

require(flexmix)

freq<- c(222950,111682,72429,48126,34515,25801,19199,15033,11859, 9226, 
7363, 5910, 4659, 3723, 2985, 2291,1907, 1447, 
1265,891,722,620,546,439,359,286,255,236,208,148,176,145,
151,135,102, 92,136, 99,102, 92, 81, 85, 71, 84, 58, 78, 59, 66 , 47, 48, 
42, 58, 43, 38, 34, 45, 21, 28, 32, 36, 27, 22, 26, 31 ,
20, 16, 12, 19, 19, 15, 18, 17,8,8, 12, 18, 10,6,5,8,9,4,7,5,8, 
10,6,3,7,2,4,3,4,6,6,5,
3,3,6,4,4,4,2,2,3,1,2,1,3,4,2,3)

zz<- as.data.frame(rep(0:111,times=freq))

colnames(zz)<- "xx"

flexfit1<- flexmix(xx ~ 1, data = zz, k = 3, model = FLXglm(family = 
"poisson"))

summary( flexfit1)

I get the following result:

Call:
flexmix(formula = xx ~ 1, data = zz, k = 3, model = FLXglm(family = 
"poisson"))

        prior   size post>0 ratio
 Comp.1 0.796 489702 561594 0.872
 Comp.2 0.204 120115 386867 0.310

 'log Lik.' -1465654 (df=3)
 AIC: 2931315   BIC: 2931349 

I mentioned k=3 in the flexmix call. Why is there only two components in the summary? also, why is df of log likelihood only 3?

I tried to replicate the above with some random data as follows:

zz<- as.data.frame(sample(0:111,10^4,replace=T))
colnames(zz)<- "xx"

flexfit1<- flexmix(xx ~ 1, data = zz, k = 3, model = FLXglm(family = 
"poisson"))

summary( flexfit1)

I get the following output:

prior size post>0 ratio
Comp.1 0.458 4614   5737 0.804
Comp.2 0.332 3259   5117 0.637
Comp.3 0.211 2127   2759 0.771

'log Lik.' -53488.65 (df=5)
AIC: 106987.3   BIC: 107023.4 

Here, I can see the 3 components and also degrees of freedom is 5. Then, why am I not getting similar result in 1st case?

In the first case, I compared the mean and variance from of the fitted model to that of data. The following code gives the model parameters -

refit1 <- refit(flexfit1)

summary(refit1)

This gives the following output -

$Comp.1
            Estimate Std. Error z value  Pr(>|z|)    
(Intercept) 0.0962269  0.0019745  48.735 < 2.2e-16 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 $Comp.2
              Estimate Std. Error z value  Pr(>|z|)    
 (Intercept) 2.2364497  0.0013964  1601.5 < 2.2e-16 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now we can calculate the - mean from fitted model =p1*m1+p2*m2= 0.796*exp(0.0962269)+0.204*exp(2.2364497) = 2.785851 which is very close to mean of data (i.e. mean(zz[,1])) which is 2.778745.

variance from fitted model = p1*m1+p2*m2+p1*p2*(m1-m2)^2=0.796*exp(0.0962269)+0.204*exp(2.2364497)+0.796*0.204*(exp(0.0962269)-exp(2.2364497))^2 =13.86233 which is much lesser than variance of data (i.e. var(zz[,1]) which is 23.4328.

So intutively, it appears that to explain the variance we should go for mixture of more than two poisson distribution.

Please let me know your thoughts

  • It might be that you do not have more independent components to fit in your data. – Heikki Nov 09 '17 at 10:03
  • 1
    look at the "prior" in first case, `0.796 + 0.204` which is `1`. So, the remaining component has zero prior / mixing proportions. – kangaroo_cliff Nov 09 '17 at 10:14
  • Thanks for your reply – tamal kuila Nov 09 '17 at 13:38
  • I have added the comparison of mean and variance of the data to that of the fitted model. Please let me know your thoughts. In this case, shouldn't the component 3 appear with 0 prior and 0 size? Also, even if the prior is 0, should the df be 3 because whatever the solution is, my model has 5 free parameters? – tamal kuila Nov 09 '17 at 14:13

0 Answers0