Mahout's synthetic control data example

Question

Mahout's wiki includes an example of using clustering on synthetic control data (here).

The example includes a data sample with 100 rows of data for each of 6 patterns in the data. What I expect when I run the example code is that some of the clustering methods would provide better or worse clustering, but that they would more or less provide clusters grouping the 6 patterns.

That's not -- at all -- what I'm seeing when I run the examples. As a beginner, this is very confusing. Furthermore, since the data isn't normalized and the periods of the cyclic data don't match up, it's very hard to see how this raw data could ever cluster properly.

Am I missing something? Can a more experienced Mahout-er provide some orientation to what one should expect in this particular example?

I'm very interested in the scenario in which patterns in time-series data can be clustered. I have tried normalizing the data and using point-to-point deltas as the basis for clustering and gotten slightly better results. Does a more experienced Data analyst have suggestions for a better approach?

Mahout's synthetic control data example

0 Answers0