2

I did clustering using K-means in Scikit. Then, I have plotted the cluster regions according to Scikit example. Next, for each cluster I did clustering again, and I want to show the boundaries of sub-clusters on the same plot. I found this question interesting, but I when I apply this method, the axis ranges changed and a new plot appears.

Edited: My function is as follows:

def plot_pca_clusters_races_match(pca_km, reduced_data, pca_data_winner,
                                  race1_pca_km, race1_reduced_data, race1_pca_data_winner, race1_nclusters,
                                  race2_pca_km, race2_reduced_data, race2_pca_data_winner, race2_nclusters,
                                  plt_opt, fig_path, race_approach, n_clusters):

    """
    :param pca_km: K-means trained by PCA data (2 components)
    :param reduced_data: PCA components
    :param data_winner: player_id, pca_component1, pca_component2, race_id, winner
    :param plt_opt: space required to plot cluster area
    :param fig_path: path to save the plot
    :param race_approach:
    :param n_clusters:
    :return:
    """

    race_id_list = ['Z', 'T', 'P']
    # 1- Plot cluster area
    x_min, x_max = reduced_data[:, 0].min() + plt_opt[0], reduced_data[:, 0].max() + plt_opt[1]
    y_min, y_max = reduced_data[:, 1].min() + plt_opt[2], reduced_data[:, 1].max() + plt_opt[3]
    step = abs((abs(x_max) - abs(x_min))) / 100
    xx, yy = np.meshgrid(np.arange(x_min, x_max, step), np.arange(y_min, y_max, step))
    Z = pca_km.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.figure(1)
    plt.clf()

    # Plot cluster regions
    plt.imshow(Z, interpolation='nearest',
               extent=(xx.min(), xx.max(), yy.min(), yy.max()),
               cmap=plt.cm.Paired,
               aspect='auto', origin='lower')

    # 2- Plot cluster members
    race_ids = list(set(pca_data_winner[:, -3]))

    # Find race type
    reduced_data_race1 = pca_data_winner[np.where(pca_data_winner[:, -3] == race_ids[0]), :][0]

    # Plot race 1
    plt.plot(reduced_data_race1[:, 2], reduced_data_race1[:, 3], 'k.', markersize=4, color='red',
             label=race_id_list[int(race_ids[0])])

    # Plot race 2
    # If the race is non-symmetric, change color of the cluster members
    if len(race_ids) > 1:
        reduced_data_race2 = pca_data_winner[np.where(pca_data_winner[:, -3] == race_ids[1]), :][0]
        plt.plot(reduced_data_race2[:, 2], reduced_data_race2[:, 3], 'k.', markersize=4, color='green',
                 label=race_id_list[int(race_ids[1])], hold=True)

    # 3-Plot cluster centers
    markers = ['d', 'v', 's', '*', 'h', 'p', 'o']
    for cluster in range(0, pca_km.cluster_centers_.shape[0]):
        plt.scatter(pca_km.cluster_centers_[cluster, 0], pca_km.cluster_centers_[cluster, 1],
                    marker=markers[cluster], s=80, linewidths=1,
                    label='Cluster ' + str(cluster),
                    color='b', zorder=4, hold=True)
        plt.xlabel('PC 1')
        plt.ylabel('PC 2')

    plt.legend(prop={'size':8})

    # --------------------------------------------- Plot boundaries of sub-clusters
   x1_min, x1_max = race1_reduced_data[:, 0].min() + plt_opt[0], race1_reduced_data[:, 0].max() + plt_opt[1]
   y1_min, y1_max = race1_reduced_data[:, 1].min() + plt_opt[2], race1_reduced_data[:, 1].max() + plt_opt[3]

   step = abs((abs(x_max) - abs(x_min))) / 100
   xx1, yy1 = np.meshgrid(np.arange(x1_min, x1_max, step), np.arange(y1_min, y1_max, step))

   Z1 = race1_pca_km.predict(np.c_[xx1.ravel(), yy1.ravel()])
   Z1 = Z1.reshape(xx1.shape)

   # Plot sub-cluster boundaries
   plt.contour(Z, extent=(xx.min(), xx.max(), yy.min(), yy.max()))

The first plot: enter image description here

After trying to add countours and scaling: enter image description here

Yuhang Lin
  • 149
  • 1
  • 11
YNR
  • 867
  • 2
  • 13
  • 28

1 Answers1

1

The first plot without contour sits in the lower left corner of the second plot. This is because contour hasn't been given a proper scale (in which case it will simply extent to the row and column index of the Z array.

You either need to supply the extent to the contour

plt.contour(Z, extent=(..,..,..,..))

or specify some X and Y arrays to determine the coordinates.

plt.contour(X,Y,Z)
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Then you for your help. How can I find extent? I tried `plt.contour(Z, extent=(xx.min(), xx.max(), yy.min(), yy.max()))` and `plt.contour(Z, extent=(x_min, x_max, y_min, y_max))`. As a result it keeps first plot, and draw some lines in the center of plot. – YNR Mar 20 '17 at 17:48
  • The extent should be the same as for the imshow plot. So I guess what you did is correct. – ImportanceOfBeingErnest Mar 20 '17 at 17:53
  • Now, the second plot shows the first plot too, but I need to put the contour around the sub-clusters. – YNR Mar 21 '17 at 08:51
  • Is this a problem of the calculation or the plotting? Be reminded that we are not into the subject, so what may be obvious to you, needs to be explained to everyone else. Especially what to expect at which part of the plot is not at all clear, so you need to descibe it in detail. – ImportanceOfBeingErnest Mar 21 '17 at 09:00
  • In the first plot, I show four cluster regions by different color. Then, I did clustering on the red points to divide the in to 2 sub-clusters (race1_pca_km is K-means clustering model). My problem is how to visualize the boundaries of sub-clusters. I tried to present the boundaries by contours, but it is not working well. – YNR Mar 21 '17 at 09:09