2

I've performed a latent class cluster analysis using Mclust in R. Now, I want to use the outcome to predict cluster membership of people that were not in the dataset I used to train. I know the predict function, but this is not what I'm looking for. On a daily basis there will be new people that I need to predict, so in order to do this I need to have the parameters to predict cluster membership.

Does anyone know how to get the right parameters that I can use in an equation to predict cluster membership myself?

data(faithful)
library(mclust)
faithfulMclust <- Mclust(faithful)
clust <- predict.Mclust(faithfulMclust,faithful) 

the Mclust uses a formula in the predict function, I want to get this formula in order to predict cases that are not in the dataset (I get new cases everyday, so using the predict function is not an option).

  • Welcome to Stack Overflow. When posting a question, it's best to include a [minimal, reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Please edit your question to include sample code and sample data (either fake data or a built in data set is fine) so that we can run the same code as you. This will make it much easier for others to answer your question. – MrFlick Jun 27 '14 at 14:50
  • I've added a sample script, not sure in which detail it will help though. – user2812696 Jul 01 '14 at 07:19

1 Answers1

1

I don't understand why you say predict won't work here. Let's say you fit your model as above and get the faithfulMclust object. Let's print the results with

plot(faithfulMclust, what="classification")
clustmeans<-faithfulMclust$parameters$mean
text(clustmeans[1,], clustmeans[2,], seq.int(ncol(clustmeans)), cex=4)

enter image description here

If the next day you experience 2 eruptions with waiting 50 and what to classify that value using the existing model, you would use

pp <- predict(faithfulMclust, newdata=data.frame(eruptions=2, waiting=50))
pp$classification
# [1] 2

Or maybe 4 errutions with waiting 70

pp <- predict(faithfulMclust, newdata=data.frame(eruptions=4, waiting=70))
pp$classification
# [1] 3

These assignments seem reasonable given our input data and model.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Yes, ofcourse the predict function works. But the situation I'm facing is that I have +/- 500k cases that need to be predicted. And every day there are new cases being added. Furthermore, the clusters need to be stored in our database in order to be used in marketing campaigns. So If I can retrieve the equation that is behind it, I can write a query that runs daily and predicts the customers immediately and automatically in the database. – user2812696 Jul 02 '14 at 11:52