1
X = np.array([[24,13,38],[8,3,17],[21,6,40],[1,14,-9],[9,3,21],[7,1,14],[8,7,11],[10,16,3],[1,3,2],
    [15,2,30],[4,6,1],[12,10,18],[1,9,-4],[7,3,19],[5,1,13],[1,12,-6],[21,9,34],[8,8,7],
  [1,18,-18],[15,8,25],[16,10,29],[7,0,17],[14,2,31],[3,7,0],[5,6,7]])
pca = PCA(n_components=1)

pca.fit(X)
a = pca.components_[0][0] # a
b = pca.components_[0][1] # b
c = pca.components_[0][2] # c

def average(values):
    if(values) ==0:
        return None
    return sum(values, 0.0) / len(values)

x_mean = average(x) # For an approximation
y_mean = average(y)
z_mean = average(z)
d = -(a * x_mean + b * y_mean + c * z_mean)

so -0.375978766054x + 0.10612154283y -0.920531469111z + 15.1366572005 = 0

Actually, I'm not sure it is right.

I want to draw a plane in this situation using matplotlib library.

How can I code this?

Brandon Minnick
  • 13,342
  • 15
  • 65
  • 123
seong park
  • 11
  • 1
  • 2

2 Answers2

3

Each principal component defines a vector in the feature space. PCA orders those vectors based on the variance of the data in each direction. So the first vector will represent the maximum variance of the data and the last vector minimum variance. Assuming the data are distributed around a plane the third vector should be perpendicular to the plane. Here's the code:

import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

X = np.array([[24,13,38],[8,3,17],[21,6,40],[1,14,-9],[9,3,21],[7,1,14],[8,7,11],[10,16,3],[1,3,2],
    [15,2,30],[4,6,1],[12,10,18],[1,9,-4],[7,3,19],[5,1,13],[1,12,-6],[21,9,34],[8,8,7],
  [1,18,-18],[15,8,25],[16,10,29],[7,0,17],[14,2,31],[3,7,0],[5,6,7]])
pca = PCA(n_components=3)

pca.fit(X)
eig_vec = pca.components_

print(pca.explained_variance_ratio_)
# [0.90946569 0.08816839 0.00236591]
# Percentage of variance explain by last vector is less 0.2%

# This is the normal vector of minimum variance
normal = eig_vec[2, :]  # (a, b, c)
centroid = np.mean(X, axis=0)

# Every point (x, y, z) on the plane should satisfy a*x+b*y+c*z = d
# Taking centroid as a point on the plane
d = -centroid.dot(normal)

# Draw plane
xx, yy = np.meshgrid(np.arange(np.min(X[:, 0]), np.max(X[:, 0])), np.arange(np.min(X[:, 1]), np.max(X[:, 1])))

z = (-normal[0] * xx - normal[1] * yy - d) * 1. / normal[2]

# plot the surface
plt3d = plt.figure().gca(projection='3d')
plt3d.plot_surface(xx, yy, z)
plt3d.scatter(*(X.T))
plt.show()

enter image description here

kastaa
  • 61
  • 1
  • 5
  • your code doesn't run. line 29 returns a syntax error for some reason. z = (-normal[0] * xx - normal[1] * yy - d) * 1. / normal[2]. tried it on an external compiler as well, same result – HermanK May 05 '20 at 20:42
  • @KantushovHerman Thanks for your comment, I just realize that I forgot to close a parenthesis on line 28 (lost it while copy and paste form my jupyter-notebook). Anyway, I edited my answer, it should run right now. – kastaa May 07 '20 at 00:33
2

The first principal component doesn't define a plane, it defines a vector in three dimensions. Here's how to visualize it in 3D: the code starts out with yours, and then has the plotting steps:

import numpy as np
from sklearn.decomposition import PCA

X = np.array([[24, 13, 38], [8, 3, 17], [21, 6, 40], [1, 14, -9], [9, 3, 21], [7, 1, 14],
              [8, 7, 11], [10, 16, 3], [1, 3, 2], [15, 2, 30], [4, 6, 1], [12, 10, 18], [1, 9, -4],
              [7, 3, 19], [5, 1, 13], [1, 12, -6], [21, 9, 34], [8, 8, 7], [1, 18, -18],
              [15, 8, 25], [16, 10, 29], [7, 0, 17], [14, 2, 31], [3, 7, 0], [5, 6, 7]])

pca = PCA(n_components=1)
pca.fit(X)

## New code below
p = pca.components_
centroid = np.mean(X, 0)
segments = np.arange(-40, 40)[:, np.newaxis] * p

import matplotlib
matplotlib.use('TkAgg') # might not be necessary for you
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
plt.ion()

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
scatterplot = ax.scatter(*(X.T))
lineplot = ax.plot(*(centroid + segments).T, color="red")
plt.xlabel('x')
plt.ylabel('y')
plt.savefig('result.png', dpi=150)

(Note the above code was auto-formatted with yapf, which I highly recommend.) Resulting figure:

Scatter plot of data with principal component

Ahmed Fasih
  • 6,458
  • 7
  • 54
  • 95