Looking at the hypertools.plot
source code, it looks like the issue is the fact that mixing reduce
and cluster
inside the same call to plot
reduces to at most three dimensions first, if smaller is not specified, then clusters. When you take the step-by-step approach, the dimensionality is not reduced to three until you plot after you have already clustered. Limiting the dimensions with ndims=3
in the analyze
function of step-by-step approach produces the same results as the one-liner you want.
So the answer to your question 'Is there a way to get the same clusters out without plotting?' would be to pass ndims=3
to the analyze
function.
From plot.py
(hypertools v0.6.2):
# reduce data to 3 dims for plotting, if ndims is None, return this
if (ndims and ndims < 3):
xform = reducer(xform, ndims=ndims, reduce=reduce, internal=True)
else:
xform = reducer(xform, ndims=3, reduce=reduce, internal=True)
# find cluster and reshape if n_clusters
if cluster is not None:
if hue is not None:
warnings.warn('cluster overrides hue, ignoring hue.')
if isinstance(cluster, (six.string_types, six.binary_type)):
model = cluster
params = default_params(model)
elif isinstance(cluster, dict):
model = cluster['model']
params = default_params(model, cluster['params'])
else:
raise ValueError('Invalid cluster model specified; should be'
' string or dictionary!')
if n_clusters is not None:
if cluster in ('HDBSCAN',):
warnings.warn('n_clusters is not a valid parameter for '
'HDBSCAN clustering and will be ignored.')
else:
params['n_clusters'] = n_clusters
cluster_labels = clusterer(xform, cluster={'model': model,
'params': params})
xform, labels = reshape_data(xform, cluster_labels, labels)
hue = cluster_labels
Example using the mushroom sample data set:
import hypertools
import numpy as np
%matplotlib inline
geo = hypertools.load('mushrooms')
data = geo.get_data()
reduced = hypertools.analyze(data, ndims=3, reduce="SparsePCA")
labels = hypertools.cluster(reduced, cluster="Birch")
hypertools.plot(reduced, '.', hue=labels)

Gives the same results as:
hypertools.plot(data, '.', reduce="SparsePCA", cluster="Birch")

Compared to step-by-step without passing ndims=3
to analyze
:
