3

I am generating a PCA which uses scikitlearn, numpy and matplotlib. I want to know how to label each point (row in my data). I found "annotate" in matplotlib, but this seems to be for labeling specific coordinates, or just putting text on arbitrary points by the order they appear. I'm trying to abstract away from this but struggling due to the PCA sections that appear before the matplot stuff. Is there a way I can do this with sklearn, while I'm still generating the plot, so I don't lose its connection to the row I got it from? Here's my code:

# Create a Randomized PCA model that takes two components
randomized_pca = decomposition.RandomizedPCA(n_components=2) 

# Fit and transform the data to the model
reduced_data_rpca = randomized_pca.fit_transform(x)

# Create a regular PCA model 
pca = decomposition.PCA(n_components=2)

# Fit and transform the data to the model
reduced_data_pca = pca.fit_transform(x)

# Inspect the shape
reduced_data_pca.shape

# Print out the data
print(reduced_data_rpca)
print(reduced_data_pca)
def rand_jitter(arr):
    stdev = .01*(max(arr)-min(arr))
    return arr + np.random.randn(len(arr)) * stdev
colors = ['red', 'blue']
for i in range(len(colors)):
    w = reduced_data_pca[:, 0][y == i]
    z = reduced_data_pca[:, 1][y == i]
    plt.scatter(w, z, c=colors[i])
targ_names = ["Negative", "Positive"]
plt.legend(targ_names, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.title("PCA Scatter Plot")
plt.show()
theupandup
  • 335
  • 3
  • 14

1 Answers1

1

PCA is a projection, not a clustering (you tagged this as clustering).

There is no concept of a label in PCA.

You can draw texts onto a scatterplot, but usually it becomes too crowded. You can find answers to this on stackoverflow already.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194