Plot PCA components derived from sklearn.decomposition.PCA separately in three dimensional space

Question

For my project, I work with three dimensional MRI data, where the fourth dimension represents different subjects (I use the package nilearn for this). I am using sklearn.decomposition.PCA to extract a given number of principal components from my data. Now I would like to plot the components separately on a brain image, that is, I would like to show a brain image with my extracted components (in this case, 2) in different colors.

Here’s an example code using the OASIS dataset, which can be downloaded via the nilearn API:

masking using nilearn.input_data.NiftiMasker, which converts my 4 dimensional data into a 2 dimesional array (n_subjects x n_voxels).
standardizing the data matrix using StandardScaler
running the PCA using sklearn.decomposition.PCA:

## set workspace
import numpy as np

from nilearn.datasets import fetch_oasis_vbm
from nilearn.input_data import NiftiMasker
from nilearn.image import index_img

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

from nilearn import plotting

## Load Data  #################################################################

# take only first 30 subjects as example
oasis_dataset = fetch_oasis_vbm(n_subjects=30)
imgs = np.array(oasis_dataset['gray_matter_maps'])

## PIPELINE ###################################################################

# create a random number generator
rng = np.random.RandomState(42)

# Convert Images to 2D Data Array 
niftimasker = NiftiMasker(mask_strategy='template')

# z-standardize images
scaler = StandardScaler()

# Extract 2 Components
pca = PCA(n_components=2,
          svd_solver='full',
          random_state=rng)

# create pipeline
pipe = Pipeline([('niftimasker',niftimasker),
                 ('scaler',scaler),
                 ('pca',pca)])

# call fit_transform on pipeline
X = pipe.fit_transform(imgs)

As far as I understand what I obtain after running the PCA are the PCA loadings? Unfortunately, I don't understand how to get from this to two images, each containing one PCA component.

score 0 · Accepted Answer · answered Apr 06 '20 at 20:16

To get data back in to image format, you will need to do a NiftiMasker.inverse_transform(). To do so it is required that you preserve the dimensions in voxel space.

So, the way the pipeline is working now, you are using dimensionality reduction on voxel space. Just in case you wanted to reduce dimensionality in subject space, you would change the following:

pipe = Pipeline([('niftimasker',niftimasker),
             ('scaler',scaler),
#                  ('pca',pca)
            ])

X = pipe.fit_transform(imgs)
X_reduced = pca.fit_transform(X.T).T

Then you would apply an inverse transform as follows:

component_image = niftimasker.inverse_transform(X_reduced)

Then to get each individual subject component image you will use index_image from nilearn.image. E.g. this is the image for the first subject component:

component1_image = index_img(component_image,0)

However, I think you are interested in reducing diemnsionality on voxel space. Therefore to preserve the voxel dimensions for the inverse transform, you will need to get the index of each voxel feature chosen in the PCA dimensionality reduction. Keep your pipeline the way you had it originally and do the following:

X = pipe.fit_transform(imgs)

components = pca.components_
#In your case 2, but replace range(2) with range(n_components)
most_important = [np.abs(components[i]).argmax() for i in range(2)]

Then tile up nan arrays with x subjects and y voxels: (in your case 30 x 229007)

comp1, comp2 = np.tile(np.nan, [30,229007]), np.tile(np.nan, [30,229007])
for x,y in enumerate(X):
    comp1[x,most_important[0]] = y[0]
    comp2[x,most_important[1]] = y[1]

Then apply the reverse transform on each component:

component1_image = niftimasker.inverse_transform(comp1)
component2_image = niftimasker.inverse_transform(comp2)

You will now have 2 images, each with 30 subjects and 1 valid voxel value representing the chosen component. It is up to you how to aggregate the component voxel over the 30 subjects, in this case I am just going to use a mean image function from nilearn.image:

mean_component1_image = mean_img(component1_image)
mean_component2_image = mean_img(component2_image)

Finally, in both cases plot the respective image. In the voxel reduced version you will see a small variation in the two images in the X dimension (second diagram) but hardly the Y and Z. I'm using plot_glass_brain from nilearn.plotting:

plotting.plot_glass_brain(mean_component1_image)
plotting.plot_glass_brain(mean_component2_image)

To use overlays, adjust color maps to make this easier to visualize, and other plotting options consult this and other nilearn plotting guides:

https://nilearn.github.io/plotting/index.html#different-display-modes

Let me know if you have any more questions.

Thanks for your answer. Concerning your first question: Yes, I would like to plot the components in voxel space, so I want to run my PCA over the subjects, not for each subject. I tried to follow your code, but I am not sure what the resulting images show? What exactly happens when you create most_important? Sorry, I am quite a beginner in this. I was expecting a brain plot like described for nilearn.decomposition.CanICA: https://nilearn.github.io/auto_examples/03_connectivity/plot_compare_decomposition.html#sphx-glr-auto-examples-03-connectivity-plot-compare-decomposition-py — Johannes Wiesner, Apr 07 '20 at 09:47
PCA and CANICA are very different. Check out a description of CANICA here: https://arxiv.org/abs/0911.4650. CANICA identifies the group-reproducible data subspace before performing ICA. In other words grouping voxels spatially. PCA just assumed every voxel was independent and treated each as a feature. This results in 1 voxel per a component out of the 200,000 plus while the prior would most likely have significantly more to display in each of its components. Since the results don't make sense, could you explain why you are choosing PCA for dimensionality reduction? — user1321988, Apr 07 '20 at 14:12

Plot PCA components derived from sklearn.decomposition.PCA separately in three dimensional space

1 Answers1