1

When I used the function "princomp" in Matlab to reduce the dimensions of features,

it warns:"Columns of X are linearly dependent to within machine precision. Using only the first 320 components to compute TSQUARED".

What dose it mean? The original dimension of features is 324.I would be very grateful if somebody can answer my question.

Vivian Lee
  • 89
  • 1
  • 6

2 Answers2

4

For a more graphic interpretation of this warning imagine your data being 3-dimensional instead of 324-dimensional. These would be points in space. The output of your function princomp should be the principal axes of an ellipsoid that aligns well with your data. The equivalent warning of Using only the first 2 components would mean: Your data points lie on a plane (up to numerical error), so your ellipsoid really is a flat ellipse. As the PCA is usually used for dimensionality reduction this isn't really that worrying. It just means, that your original data is not 324-dimensional, but really only 320-dimensional, yet resides in R^324.

You would get the same warning using this random data:

N = 100;
X = [10*rand(N,1), 2*rand(N,1), zeros(N,1)];
X_centered = bsxfun(@minus, X, mean(X));
[coeff, score, latent, tsquare] = princomp(X_centered);
plot3(X_centered(:,1), X_centered(:,2), X_centered(:,3), '.');

Random points

coeff(:,1) will be approximately [1;0;0] and latent(1) the biggest value, as the data is spread most along the x-axis. The second vector coeff(:,2) will be approximately the vector [0;1;0] while latent(2) will be quite a bit smaller than latent(1), as the second most important direction is the y-axis, which is not as spread out as the first direction. The rest of the vectors will be some vectors that are orthonormal to our current vectors. (In our simple case there is only the possibility of [0;0;1], and latent(3) will be zero, as the data is flat) [Bear in mind that the principal components will always be orthogonal to each other.]

knedlsepp
  • 6,065
  • 3
  • 20
  • 41
0

This warning occurs only if nargout>3, i.e. tsquared is requested. You can still trust the principal components if you get this warning.

I encountered this warning once because some columns of my input matrix had 0 variance, so they did not contribute any "information" to the data. More generally, the warning occurs, as @knedlsepp described in his/her answer, if there are latent dimensions that have 0 (or almost zero) extent. In those cases, one cannot compute the Hotelling T-squared statistics tsquared (division by zero).

Have a look at the source code of by entering open pca in the Matlab console. For me, simply disabling the specific warning was sufficient.

warning('off', 'stats:pca:ColRankDefX')
normanius
  • 8,629
  • 7
  • 53
  • 83