I have tried PCA analysis with this script.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
from sklearn.preprocessing import StandardScaler
raw_data_frame =
pd.read_table('/content/drive/MyDrive/BI/colab_input_output/16samples_vaf_df_forpca.csv',
sep=",", header=0, index_col=0)
data_scaler = StandardScaler()
data_scaler.fit(raw_data_frame)
scaled_data_frame = data_scaler.transform(raw_data_frame)
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
pca.fit(scaled_data_frame)
x_pca = pca.transform(scaled_data_frame)
plt.figure(figsize=(10, 7))
plt.scatter(x_pca[:,0],x_pca[:,1], c=raw_data_frame['target'], cmap='viridis')
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
And the output is
I want to label the dots with the information in the dataframe.
The format of the dataframe is
('1', 187963806) ('19', 49972822) ('8', 14555764) ('11', 127666530) ('18', 67693298) target
15_R71_epi 0.310344828 0.227272727 0.217391304 0.149253731 0 1
15_R21_epi 0.1875 0.228070175 0.173913043 0.25862069 0 1
15_L133_epi 0.078947368 0.085714286 0.145454545 0.119047619 0 1
15_L58_epi 0.222222222 0.19047619 0.302325581 0.333333333 0 1
15_C5_epi 0.267326733 0.132075472 0.275362319 0.220779221 0 1
15_Lt_Nasal_derm 0.359375 0.039215686 0.274509804 0.192982456 0 2
15-H-21 0.322580645 0.255319149 0.238095238 0.380952381 0 3
15_H-55 0.446808511 0.27027027 0.387755102 0.347826087 0 3
15_H-49 0.30952381 0.236363636 0.266666667 0.235294118 0 3
15_H-3 0.12962963 0.153846154 0.085106383 0.205479452 0 3
15_H-33 0.349206349 0.263157895 0.298245614 0.328571429 0 3
15-RK-62 0.235294118 0.152173913 0.191780822 0.2 0 4
15_RK-29 0.078431373 0.094339623 0.175438596 0.121212121 0 4
15_LK-168 0.185185185 0.132075472 0.12 0.2 0 5
15_LK-114 0.173076923 0.075 0.14893617 0.237288136 0 5
15_LK-176 0.253968254 0.113207547 0.127272727 0.291666667 0.035087719 5
(This looks bad, but if you copy, it would be in a good form)
The color of the dots correspond with the numbers in the column "target"
But in the figure I can't distinguish the names of the samples.
How can I do?