I have a dataset of size 200*119 i.e. my samples are 200 and the variables/features are 119. I want to use PCA to optimize my feature set by selecting only those features that contribute significantly to classification.
I have understood the concept of PCA but am unable to implement it. I have found out the coeff and score
of my data using the pca
function.
[coeff, score] = pca(data);
The coeff matrix is of size 119x119
now.
But what do I do with this information? My goal is to find the reduced dataset that can be fed into a classifier. I have gone through the documentation for pcares and even looked at similar questions posted regarding this issue. But I am unable to understand how [residuals, reconstructed]=pcares(data, ndim)
will help me "reduce" the size of my dataset. How do I go about choosing ndim parameter?
EDIT
I used the following code to reduce dataset.
B=data;
sigma = cov(B);
%// Find eigenvalues and eigenvectors of the covariance matrix
[A,D] = eig(sigma);
vals = diag(D);
%// Sort their eigenvalues
[~,ind] = sort(abs(vals), 'descend');
%// Rearrange eigenvectors
Asort = A(:,ind);
%// Find mean subtracted data
Bm = bsxfun(@minus, B, mean(B,1));
%// Reproject data onto principal components
Bproject = Bm*Asort;
However, my Bproject is still of the size 200*119
I do not understand this. Please explain.