I've been testing out how well PCA and LDA works for classifying 3 different types of image tags I want to automatically identify. In my code, X is my data matrix where each row are the pixels from an image and y is a 1D array stating the classification of each row.
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.lda import LDA
pca = PCA(n_components=2)
X_r = pca.fit(X).transform(X)
plt.figure(figsize = (35, 20))
plt.scatter(X_r[:, 0], X_r[:, 1], c=y, s=200)
lda = LDA(n_components=2)
X_lda = lda.fit(X, y).transform(X)
plt.figure(figsize = (35, 20))
plt.scatter(X_lda[:, 0], X_lda[:, 1], c=y, s=200)
With the LDA, I end up with 3 clearly distinguishable clusters with only slight overlap between them. Now if I have a new image I want to classify, once I turn it into a 1D array, how do I predict which cluster it should fall into and if it falls too far from the centre how can I say that the classification is "inconclusive"? I was also curious what the ".transform(X)" function did to my data once I had fit it.