0

I wanted to generate a 3D plot to display the separation of the two classes. I looked at this solution, but do not know how to implement the separation plane in a px.scatter_3d

Here is the code that I have so far:

import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import pandas as pd
import os
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA

#df = pd.read_csv('df00_snippet.csv')
#X_train_flat = df.drop(columns=['Label']).values

#ydata = df['Label'].values

#X_train_flat

pca_train = PCA().fit(X_train_flat)

pca_train = PCA(n_components = 4) 
x_pca = pca_train.fit_transform(X_train_flat)

y_train_new = ydata.astype(str)

# https://plotly.com/python/3d-scatter-plots/
fig = px.scatter_3d(x_pca,
            x= x_pca[:,0], y= x_pca[:,1],z = x_pca[:,2], 
            labels={'x':'PCA-1', 'y':'PCA-2','z':'PCA-3'},
            size_max=13,
            #symbol=y_train_new, 
            opacity=1,
            color=y_train_new,
            color_discrete_sequence=["blue", "green"],
            title='3d Plot of Top 3 PCA components')
fig.show()

Here is a snippet of mydata:

feat1   feat2   feat3   feat4   Label
-3.8481877  -0.47685334 0.63422906  1.0396314   1
-2.320888   0.65347993  1.1519914   0.12997247  1
1.5827686   1.4119303   -1.7410104  -4.6962333  1
-0.1337152  0.13315737  -1.6648949  -1.4205348  1
-0.4028037  1.332986    1.3618442   0.3292255   1
-0.015517877    1.346349    1.4083523   0.87017965  1
-0.2669228  0.5478992   -0.06730786 -1.5959451  1
-0.03318152 0.3263167   -2.116833   -5.4616213  1
0.4588691   0.6723614   -1.617398   -4.3511734  1
0.5899199   0.66525555  -1.694493   -3.9452586  1
1.610061    2.4186094   1.8807093   1.3764497   0
1.7985699   2.4387648   1.6306056   1.1184534   0
-9.222036   -9.9776 -9.832  -9.909746   0
0.21364458  -1.0171559  -4.9093766  -6.2154694  0
-0.019955145    -1.1677283  -4.6549516  -5.9503417  0
0.44730473  -0.77167743 -4.7527356  -5.971007   0
-0.16508447 -0.005777468    -1.5020386  -4.49326    0
-0.8654994  -0.54387957 -1.300646   -4.621529   0
-1.7471086  -2.0005553  -1.7533782  -2.6065414  0
-1.5313624  -1.6995796  -1.4394685  -2.600004   0

Can you assist me in generating the separation plane? Thanks!

Joe
  • 357
  • 2
  • 10
  • 32
  • 1
    I see you reduce the dimensionality of the data to 3D, but I don't see you have any computation of a classifier plane. Try using SVM for that? https://scikit-learn.org/stable/modules/svm.html If this seems relevant, let me know and I'll try explaining how to plot this – Barak Itkin Jul 10 '22 at 05:57
  • @BarakItkin, I am using Random Forest as the classifier. – Joe Jul 10 '22 at 06:28

1 Answers1

1

Took quite a few hours, but here's my attempt at it.

There are 2 things that need to be done:

For generating the points on the plane, we use a portion of the code from the 3D Plane in PCA post (utilizing the "ax+by+cz=d") using the 'x_pca' variable of fitted points and the eigenvector's the from the 'pca_train' variable (see note at end of answer). The normal 'a, b, and c' are generated from the 'eig_vec' variable. The x and y coordinates are generated and the 'centroid' and 'd' value is calculate and passed into the "ax+by+cz=d" Which gives us the x, y and z coordinates of the plane.

As for putting the plane on the Scatter Plot, that is the simplest part. Using the Adding Planes to a 3D Scatter post, we can use the points xx, yy and z to generate the plane. The colour of the plane can be changed by getting a new RGB value and change both ‘#FFDB58' hex values.

The code:

import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import pandas as pd
import plotly.graph_objects as go
import os
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA

df = pd.read_csv('df00_snippet.csv')
X_train_flat = df.drop(columns=['Label']).values

ydata = df['Label'].values

pca_train = PCA(n_components = 4).fit(X_train_flat)

x_pca = pca_train.fit_transform(X_train_flat)

y_train_new = ydata.astype(str)

# https://plotly.com/python/3d-scatter-plots/
fig = px.scatter_3d(x_pca,
            x= x_pca[:,0], y= x_pca[:,1],z = x_pca[:,2], 
            labels={'x':'PCA-1', 'y':'PCA-2','z':'PCA-3'},
            size_max=13,
            #symbol=y_train_new, 
            opacity=1,
            color=y_train_new,
            color_discrete_sequence=["blue", "green"],
            title='3d Plot of Top 3 PCA components')

# -- Start calculating the plane --
# https://stackoverflow.com/questions/49957601/how-can-i-draw-3d-plane-using-pca-in-python

eig_vec = pca_train.components_

# This is the normal vector of minimum variance
normal = eig_vec[2, :]  # (a, b, c)
centroid = np.mean(x_pca, axis=0)

# Every point (x, y, z) on the plane should satisfy a*x+b*y+c*z = d

# Taking centroid as a point on the plane
d = -centroid.dot(normal)

# Calculate the plane's x, y and z coordinates
xx, yy = np.meshgrid((np.min(x_pca[:, 0]), np.max(x_pca[:, 0])), (np.min(x_pca[:, 1]), np.max(x_pca[:, 1])))
# Generated from the a*x+b*y+c*z = d formula
z = (-normal[0] * xx - normal[1] * yy - d) * 1. / normal[2]

# Add a plane to the figure
# https://stats.stackexchange.com/questions/163356/fitting-a-plane-to-a-set-of-points-in-3d-using-pca
fig.add_trace(go.Surface(x=xx, y=yy, z=z, colorscale=[[0, '#00FFFF'], [1, '#00FFFF']],  showscale=False))
fig.show()

Note: After running this the 'x' and 'y' axis seems to be ok, but the 'z' axis seems to be off. Which I think has something to do with this line:

eig_vec = pca_train.components_
Patrick
  • 101
  • 4