3

I am trying to plot boundary lines of Iris data set using LDA in sklearn Python based on this documentation.

For two dimensional data, we can easily plot the lines using LDA.coef_ and LDA.intercept_.

But for multidimensional data that has been reduced to two components, the LDA.coef_ and LDA.intercept has many dimensions which I don't know how to use these to plot the boundary lines in 2D reduced-dimension plot.

I've tried to plot using only the first two-element of LDA.coef_ and LDA.intercept, but It didn't work.

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

iris = datasets.load_iris()

X = iris.data
y = iris.target 
target_names = iris.target_names  

lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit(X, y).transform(X)

x = np.array([-10,10])
y_hyperplane = -1*(lda.intercept_[0]+x*lda.coef_[0][0])/lda.coef_[0][1]

plt.figure()
colors = ['navy', 'turquoise', 'darkorange']
lw = 2

plt.plot(x,y_hyperplane,'k')

for color, i, target_name in zip(colors, [0, 1, 2], target_names):
    plt.scatter(X_r2[y == i, 0], X_r2[y == i, 1], alpha=.8, color=color, 
lw=lw,
                label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title('LDA of IRIS dataset')

plt.show()

Result of boundary line produced by lda.coef_[0] and lda.intercept[0] showed a line that isn't likely to separate between two classes

enter image description here

I've tried using np.meshgrid to draw areas of the classes. But I get an error like this

ValueError: X has 2 features per sample; expecting 4

which expecting 4 dimensional of original data, instead of 2D points from the meshgrid.

Luxi
  • 31
  • 1
  • 3
  • It looks like your code was adapted from [this](https://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_vs_lda.html#sphx-glr-auto-examples-decomposition-plot-pca-vs-lda-py), is that right? I'd bet the issue is that the points your plotting are transformed (to maximize class separation), whereas the separating planes are in the original coordinates. Also, consider simulating some simple 2d data, and making sure you understand the output, and can plot the separating plane in that case. _Then_ move on to debugging this. – bogovicj Sep 05 '19 at 12:25
  • @bogovicj, Yes, it is code from sklearn documentation. I already know how to plot separating plane from 2D data, which using the 2D elements of coef_ and the intercept_; then straight plotting based on equation of y_hyperplane from the code above. Yes, the issue is the documentation transformed the original data of **four dimensional** Iris data, into the **two dimensional** LDA transformation so we can visualize the result. But the elements of coef_ and intercept_ is also in **four-dimension**, so I am confuse of how to use those elements to plot the hyperplane. – Luxi Sep 05 '19 at 14:37

1 Answers1

4

Linear discriminant analysis (LDA) can be used as a classifier or for dimensionality reduction.

LDA for dimensionality reduction

Dimensionality reduction techniques reduces the number of features. Iris dataset has 4 features, lets use LDA to reduce it to 2 features so that we can visualise it.

from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

lda = LinearDiscriminantAnalysis(n_components=2)
lda_object = lda.fit(X, y)
X = lda_object.transform(X)

for l,c,m in zip(np.unique(y),['r','g','b'],['s','x','o']):
    plt.scatter(X[y==l,0],
                X[y==l,1],
                c=c, marker=m, label=l,edgecolors='black')

Output: enter image description here

LDA for multi class classification

LDA does multi class classification using One-vs-rest. If you have 3 classes you will get 3 hyperplanes (decision boundaries) for each class. If there are n features then each hyperplane is represented using n weights (coefficients) and 1 intersect. In general

coef_ : shape of (n_classes, n_features)
intercept_ :  shape of (n_classes,)

Sample, documented inline

import matplotlib.pyplot as plt
import numpy as np
np.random.seed(13)

# Generate 3 linearly separable dataset of 2 features
X = [[0,0]]*25+[[0,10]]*25+[[10,10]]*25
X = np.array(list(map(lambda x: list(map(lambda y: np.random.randn()+y, x)), X)))
y = np.array([0]*25+[1]*25+[2]*25)

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis()
lda_object = lda.fit(X, y)

# Plot the hyperplanes
for l,c,m in zip(np.unique(y),['r','g','b'],['s','x','o']):
    plt.scatter(X[y==l,0],
                X[y==l,1],
                c=c, marker=m, label=l,edgecolors='black')

x1 = np.array([np.min(X[:,0], axis=0), np.max(X[:,0], axis=0)])

for i, c in enumerate(['r','g','b']):
    b, w1, w2 = lda.intercept_[i], lda.coef_[i][0], lda.coef_[i][1]
    y1 = -(b+x1*w1)/w2    
    plt.plot(x1,y1,c=c)

enter image description here

As you can see each decision boundary separates one class from the rest (follow the color of the decision boundary)

You case

You have dataset which is of 4 features, so you cannot visualise the data as well as the decision boundary (human visualisation is limited only upto 3D). One approach is to use LDA and reduce the dimentions to 2D and then again using LDA to classify these 2D features.

mujjiga
  • 16,186
  • 2
  • 33
  • 51