How to apply PCA on a dataset and print the relevant features

Question

I have a dataset with 23 rows and 48 columns. I am applying PCA to reduce the number of column dimensions. I use the following codes examples and I see that only 23 are required features:

#first
import numpy as np
from sklearn.decomposition import PCA
pca = PCA().fit(only_features)
plt.figure(figsize=(15,8))
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')

#second
df_pca = pca.fit_transform(X=only_features)
df_pca = pd.DataFrame(df_pca)
print(df_pca.shape)

However, I would want to know which are the features required. Like for example: If the original dataset had columns A-z and reduced by PCA, then I would want to know which are the features selected.

How to do that?

Thanks for help

score 0 · Accepted Answer · answered Jun 29 '21 at 18:11

Credit to this answer1 & answer2, Sklearn's documentation states that the number of components retained when you don't specify the n_components parameter is min(n_samples, n_features). So min(23, 48) = 23 that's why you required 23 in your case.

Solution 1: if you use Sklearn library credit to this answer

check variance of PCs by: pca.explained_variance_ratio_
check importance of PCs by: print(abs( pca.components_ ))
using customized function to extract more info about PCs see this answer.

Solution 2: if you use PCA library documenetation

# Initialize
model = pca()
# Fit transform
out = model.fit_transform(X)

# Print the top features. The results show that f1 is best, followed by f2 etc
print(out['topfeat'])

#     PC      feature
# 0  PC1      f1
# 1  PC2      f2
# 2  PC3      f3
# 3  PC4      f4
# 4  PC5      f5
...

Even you can make a plot of PCs by: model.plot()

How to apply PCA on a dataset and print the relevant features

1 Answers1