I have a set of 100 observations where each observation has 45 characteristics. And each one of those observations have a label attached which I want to predict based on those 45 characteristics. So it's an input matrix with the dimension 45 x 100 and a target matrix with the dimension 1 x 100.
The thing is that I want to know how many of those 45 characteristics are relevant in my set of data, basically the principal component analysis, and I understand that I can do this with Matlab function processpca
.
Could you please tell me how can I do this? Suppose that the input matrix is x
with 45 rows and 100 columns and y
is a vector with 100 elements.
-
1@David Heffernan: That's about the vaguest comment I've seen on SO, yet. @Jack: R is similar to Matlab syntactically. You could take a look at http://www.uga.edu/strata/software/pdf/pcaTutorial.pdf if you want to go that route. – aqua Feb 10 '11 at 20:44
-
Yeah, R's what you want for PCA – David Heffernan Feb 10 '11 at 20:45
-
2@David Heffernan: What about Matlab's code for principal component analysis is lacking so much that one *has* to switch to R to get decent results? – Jonas Feb 10 '11 at 21:17
-
@Jonas I'm sure you can do it in Matlab and I'm sure it works well, it's just that stats is easiest in R. – David Heffernan Feb 10 '11 at 21:27
-
What's more, if this question had been asked with an R tag it would be brimming over with helpful answers by now. – David Heffernan Feb 10 '11 at 21:29
-
1you might find this post useful: http://stackoverflow.com/questions/4402110/principal-component-analysis-in-matlab/4403027#4403027 – Amro Feb 11 '11 at 03:36
5 Answers
Assuming that you want to construct a model of the 1x100 vector, based on the 45x100 matrix, I am not convinced that PCA will do what you think. PCA can be used to select variables for model estimation, but this is a somewhat indirect way to gather a set of model features. Anyway, I suggest reading both:
and...
...both of which provide code in MATLAB not requiring any Toolboxes.

- 984
- 6
- 9
Have you tried COEFF = princomp(x)
?
COEFF = princomp(X)
performs principal components analysis (PCA) on the n-by-p data matrixX
, and returns the principal component coefficients, also known as loadings. Rows ofX
correspond to observations, columns to variables.COEFF
is a p-by-p matrix, each column containing coefficients for one principal component. The columns are in order of decreasing component variance.

- 34,255
- 14
- 110
- 165
-
-
I have tried this, but the thing is that I want to see which one of those characteristics are important within my data. I want to build a Neural Network and I want to pass as input only those characteristics. – Simon Feb 11 '11 at 10:47
You should find correlation matrix. in the following example matlab finds correlation matrix with 'corr' function
http://www.mathworks.com/help/stats/feature-transformation.html#f75476
-
Please place all relevant code and documentation here. A link can change over time and make this answer invalid. – rfornal Feb 19 '15 at 20:31
From your question I deduced you don't need to do it in MATLAB, but you just want to analyze your dataset. According to my opinion the key is visualization of the dependencies.
If you're not forced to do the analysis in MATLAB I'd suggest you try more specialized software something like WEKA (www.cs.waikato.ac.nz/ml/weka/) or RapidMiner (rapid-i.com). Both tools can provide PCA and other dimension reduction algorithms + they contain nice visualization tools.

- 1,306
- 1
- 12
- 22
Your use case sounds like a combination of Classification and Feature Selection.
Statistics Toolbox offers a lot of good capabilities in this area. The toolbox provides access to a number of classification algorithms including
- Naive Bayes Classifiers Bagged
- Decision Trees (aka Random Forests)
- Binomial and Multinominal logistic regression
- Linear Discriminant analysis
You also have a variety of options available for feature selection include
- sequentialfs (forwards and backwards feature selection)
- relifF
- "treebagger" also supports options for feature selection and estimating variable importance.
Alternatively, you can use some of Optimization Toolbox's capabilities to write your own custom equations to estimate variable importance.
A couple monthes back, I did a webinar for The MathWorks titled "Compuational Statistics: Getting Started with Classification using MTALAB". You can watch the Webinar at
http://www.mathworks.com/company/events/webinars/wbnr51468.html?id=51468&p1=772996255&p2=772996273
The code and the data set for the examples is available at MATLAB Central
http://www.mathworks.com/matlabcentral/fileexchange/28770
With all this said and done, many people using Principal Component Analysis as a pre-processing step before applying classification algorithms. PCA gets used alot
- When you need to extract features from images
- When you're worried about multicollinearity

- 86
- 2