1

I am looking for a piece of software (python preferred, but really anything for which a jupyter kernel exists) to fit a data sample to a mixture of t-distributions.

I searched quite a while already and it seems to be that this is a somehwat obscure endeavor as most search results turn up for mixture of gaussians (what I am not interested here).

TThe most promising candidates so far are the "AdMit" and "MitSEM" R packages. However I do not know R and find the description of these packages rather comlple and it seems their core objective is not the fitting of mixtures of t’s but instead use this as a step to accomplish something else.

This is in a nutshell what I want the software to accomplish:

Fitting a mixture of t-distributions to some data and estimate the "location" "scale" and "degrees of freedom" for each.

I hope someone can point me to a simple package, I can’t believe that this is such an obscure use case.

vare
  • 91
  • 1
  • 6

2 Answers2

6

This seems to work (in R):

Simulate example:

 set.seed(101)
 x <- c(5+ 3*rt(1000,df=5),
        10+1*rt(10000,df=20))

Fit:

 library(teigen)
 tt <- teigen(x,
        Gs=2,   # two components
        scale=FALSE,dfupdate="numeric",
        models=c("univUU")  # univariate model, unconstrained scale and df
        # (i.e. scale and df can vary between components)
 )

The parameters are all reasonably close (except for the df for the second component, but this is a very tough thing to estimate ...)

 tt$parameters[c("df","mean","sigma","pig")]
 ## $df    ## degrees of freedom
 ## [1]  3.578491 47.059841  
 ## $mean  ## ("location")
 ##           [,1]
 ## [1,]  4.939179
 ## [2,] 10.002038
 ## $sigma    ## reporting variance rather than sd (I think?)
 ## , , 1
 ##          [,1]
 ## [1,] 8.763076
 ## , , 2
 ##          [,1]
 ## [1,] 1.041588
 ## $pig     ## mixture probabilities
 ## [1] 0.09113273 0.90886727
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • thanks Ben, this seems to do exactly what I need! Now I only need to find a way to make this callable from within python ;) As an R newbie: why is the output of the sigma parameter so seemingly scrambled up? – vare Jul 02 '17 at 10:55
  • It's more complex than seems necessary because `teigen` is primarily intended for *multivariate* t-mixtures, which would have separate variance-covariance matrices for each component. `c(tt$parameters$sigma)` should reduce the array to a vector. It should not be too hard to develop your own maximum-likelihood estimation of a univariate t-mixture with one of the many MLE-fitting libraries for Python, but it will be more "from scratch" than you seem interested in (and since I'm less familiar with Python, more effort for me to whip something up) – Ben Bolker Jul 02 '17 at 14:52
  • Excellent answer Ben. This was just what I was looking for. – bill_080 Oct 29 '18 at 21:45
1

Late to this party but since you prefer something for Python, there appear to be several packages out there on pypi that fit finite Student's t mixtures, including:

https://pypi.org/project/studenttmixture/

https://pypi.org/project/student-mixture/

https://pypi.org/project/smm/

so all of these can be installed with pip.

Scikit-learn and the other usual suspects obviously don't have this functionality at this time.

HappyDog
  • 246
  • 2
  • 8