clustering 1D data and representing clusters on matplotlib histogram

Question

I have 1D data in the format of:

areas = ...
plt.figure(figsize=(10, 10))
plt.hist(areas, bins=80)
plt.show()

The plot of this looks something along the lines of this:

Now I want to be able to cluster this data. I know that I have the option of either Kernel Density Estimation or K-Means. But once I have these values, how am I represent this clusters on the histogram?

@JayPatel I want the histogram as shown above, but the colors indicating the cluster they these datapoints are from. A legend to show the cluster center for each color would also be very nice. — SDG, Feb 23 '21 at 02:58

score 4 · Accepted Answer · edited Feb 23 '21 at 11:26

You just need to figure out your cluster assignment, and then plot each subset of the data individually while taking care that the bins are the same each time.

import numpy as np
import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

import matplotlib as mpl
mpl.rcParams['axes.spines.top'] = False
mpl.rcParams['axes.spines.right'] = False

# simulate some fake data
n = 10000
mu1, sigma1 = 0, 1
mu2, sigma2 = 6, 2
a = mu1 + sigma1 * np.random.randn(n)
b = mu2 + sigma2 * np.random.randn(n)
data = np.concatenate([a, b])

# determine which K-Means cluster each point belongs to
cluster_id = KMeans(2).fit_predict(data.reshape(-1, 1))

# determine densities by cluster assignment and plot
fig, ax = plt.subplots()
bins = np.linspace(data.min(), data.max(), 40)
for ii in np.unique(cluster_id):
    subset = data[cluster_id==ii]
    ax.hist(subset, bins=bins, alpha=0.5, label=f"Cluster {ii}")
ax.legend()
plt.show()

clustering 1D data and representing clusters on matplotlib histogram

1 Answers1

Linked