PCA implementation on 3D numpy array

Question

I have a feature set of size 2240*5*16. 2240 are number of samples, 5 represents number of channels and 16 shows # of statistical features extracted such as mean, variance, etc. Now, I want to apply PCA. However, PCA is applicable on 2D array. I applied the following code:

from sklearn.decomposition import PCA
pca = PCA(n_components=5)
pca.fit(features)

I get the following error.

ValueError: Found array with dim 3. Estimator expected <= 2.

It doesn't support axis argument. As it is only applicable on 2D, how can I utilize it on my case (3D)? Any suggestion, if I want to reduce the dimensions from 2240*5*16 to 2240*5*5, please?

If I understand, isn't it sufficient to select? Something like `pca.fit(features[:,:,0])`. — non87, Jan 20 '21 at 19:53
You need to understand how PCA works, and what an appropriate transformation for your data would be. Once you've figured it out, SO is the right place to get help to actually do the transformation. It is not the right place to explain how mathematical tools are derived or how they should be used. — Mad Physicist, Jan 20 '21 at 19:55
@non87. Thank you so much for your answer. No, it doesn't work like this, the dimensions are still 3D, produce the same error. — Muhammad Shahzad, Jan 20 '21 at 21:05
@MadPhysicist, thank you so much for your answer. With due respect, I know how to use PCA. I don't think that I have asked something wrong. If I had 2D data, it would be a simple straightforward way. Here, I don't want to convert it to the 2D array. My question was simple that how could I reduce the feature dimension from 16 to 5. Thanks again :) — Muhammad Shahzad, Jan 20 '21 at 21:09
@MuhammadShahzad. You are the only one that really understands what your data means, and how to get it into a format that is suitable for the PCA that you want. Reducing a feature dimension from 16 to 5 can be done in any number of ways, and choosing that way is therefore up to you. Once you've decided what you want to do, it becomes an answerable programming question. — Mad Physicist, Jan 20 '21 at 22:12
@MuhammadShahzad sure! It is a simple case for reducing dim from 3 to 2. Please check: https://stackoverflow.com/q/34972142/5041759, https://stackoverflow.com/q/48003185/5041759 and https://stackoverflow.com/q/48003185/5041759. However, you need to be careful about the dimensionality of the Array (which is 3 and should be 2) and the dimensionality of the feature set. These 2 are different entities. Hope this helps! — foobar, Jan 21 '21 at 00:25

score 2 · Accepted Answer · answered Jan 21 '21 at 04:10

I would just loop over each channel and do PCA separately.

import numpy as np
from sklearn.decomposition import PCA

X = np.random.rand(1000, 5, 10)

X_transform = np.zeros((X.shape[0], 5, 5))
for i in range(X.shape[1]):

    pca = PCA(n_components=5)
    f  = pca.fit_transform(X[:, i, :])

    X_transform[:, i, :] = f

print((X_transform.shape))

PCA implementation on 3D numpy array

1 Answers1