0

I have tried PCA analysis with this script.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
from sklearn.preprocessing import StandardScaler

raw_data_frame = 
pd.read_table('/content/drive/MyDrive/BI/colab_input_output/16samples_vaf_df_forpca.csv', 
sep=",", header=0, index_col=0)

data_scaler = StandardScaler()
data_scaler.fit(raw_data_frame)
scaled_data_frame = data_scaler.transform(raw_data_frame)
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
pca.fit(scaled_data_frame)

x_pca = pca.transform(scaled_data_frame)
plt.figure(figsize=(10, 7))
plt.scatter(x_pca[:,0],x_pca[:,1], c=raw_data_frame['target'], cmap='viridis')
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')

And the output is

enter image description here

I want to label the dots with the information in the dataframe.

The format of the dataframe is

     ('1', 187963806)   ('19', 49972822)    ('8', 14555764) ('11', 127666530)   ('18', 67693298)    target
15_R71_epi  0.310344828 0.227272727 0.217391304 0.149253731 0   1
15_R21_epi  0.1875  0.228070175 0.173913043 0.25862069  0   1
15_L133_epi     0.078947368 0.085714286 0.145454545 0.119047619 0   1
15_L58_epi  0.222222222 0.19047619  0.302325581 0.333333333 0   1
15_C5_epi   0.267326733 0.132075472 0.275362319 0.220779221 0   1
15_Lt_Nasal_derm    0.359375    0.039215686 0.274509804 0.192982456 0   2
15-H-21     0.322580645 0.255319149 0.238095238 0.380952381 0   3
15_H-55     0.446808511 0.27027027  0.387755102 0.347826087 0   3
15_H-49     0.30952381  0.236363636 0.266666667 0.235294118 0   3
15_H-3  0.12962963  0.153846154 0.085106383 0.205479452 0   3
15_H-33     0.349206349 0.263157895 0.298245614 0.328571429 0   3
15-RK-62    0.235294118 0.152173913 0.191780822 0.2 0   4
15_RK-29    0.078431373 0.094339623 0.175438596 0.121212121 0   4
15_LK-168   0.185185185 0.132075472 0.12    0.2 0   5
15_LK-114   0.173076923 0.075   0.14893617  0.237288136 0   5
15_LK-176   0.253968254 0.113207547 0.127272727 0.291666667 0.035087719 5

(This looks bad, but if you copy, it would be in a good form)

The color of the dots correspond with the numbers in the column "target"

But in the figure I can't distinguish the names of the samples.

How can I do?

SG Kwon
  • 163
  • 1
  • 9
  • Does this answer your question? [Scatter plot with different text at each data point](https://stackoverflow.com/questions/14432557/scatter-plot-with-different-text-at-each-data-point) – ekon Jan 05 '23 at 07:16
  • @ekon Some part of the question is similar, but I don't know how the labeling can be applied to my script. – SG Kwon Jan 09 '23 at 05:31

0 Answers0