I'm using the mclust library for R ( http://www.stat.washington.edu/mclust ) to do some experimental EM-based GMM clustering. The package is great and seems to generally find very good clusters for my data.
The problem is that I don't really know R at all, and while I have managed to muddle through the clustering process based on the help() contents and the extensive readme, I cannot for the life of me figure out how to write out the actual cluster results to file. I am using the following absurdly simple script to perform the clustering,
myData <- read.csv("data.csv", sep=",", header=FALSE)
attach(myData)
myBIC <- mclustBIC(myData)
mySummary <- summary( myBIC, data=myData )
at which point I have cluster results and a summary. The data in data.csv is just a list of multi-dimensional points, one per line. So each line looks like 'x,y,z' (in the case of 3 dimensions).
If I use 2d points (e.g. just the x and y vals) I can then use the internal plot function to get a very pretty graph that plots the original points and color codes each point based on the cluster it was assigned to. So I know all the info is somewhere in 'myBIC', but the docs and help don't seem to provide any insight as to how to print out this data!
I want to print out a new file based on the results I believe are encoded in myBIC. Something like,
CLUST x, y, z
1 1.2, 3.4, 5.2
1 1.2, 3.3, 5.2
2 5.5, 1.3, 1.3
3 7.1, 1.2, -1.0
3 7.2, 1.2, -1.1
and then - hopefully - also print out the parameters/centroids of the individual gaussians/clusters that the clustering process found.
Surely this is an absurdly easy thing to do and I'm just too ignorant of R to figure it out...
EDIT: I seem to have gotten a little bit further along. Doing the following prints out a somewhat cryptic matrix,
> mySummary$classification
[1] 1 1 2 1 3
[6] 1 1 1 3 1
[12] 1 2 1 3 1
[18] 1 3
which upon reflection I realized is actually the list of samples and their classifications. I guess it is not possible to write this directly via the write command, but a bit more experimentation in the R console lead me to realize that I can do this:
> newData <- mySummary$classification
> write( newData, file="class.csv" )
and that the result actually looks pretty nice!
$ head class.csv
"","x"
"1",1
"2",2
"3",2
where the first column apparenly matches the index for the input data, and the second column describes the assigned class identity.
The 'mySummary$parameters' object appears to be nested though, and has a bunch of sub-objects corresponding to the individual gaussians and their parameters, etc. The 'write' function fails when I try to just write it out, but individually writing out each sub object name is a bit tedious. Which leads me to a new question: how do I iterate over a nested object in R and print the elements out in a serial fashion to a file descriptor?
I have this 'mySummary$parameters' object. It is composed of several sub-objects like 'mySummary$parameters$variance$sigma', etc. I would like to just iterate over everything and print it all to file in the same way that this is done to the CLI automatically...