Principal Components for categorical Variables

Question

I have data that contains both continuous and categorical variables. I want to find principal components as one can find using prcomp function (in R) for continuous variables. I've seen the function MFA in the FactoMineR package. I grouped all categorical variables as one group and continuous variables as the other group in MFA(). After running the function and trying to print the result res = MFA(...), I get:

       name                 description                                           
1  "$eig"               "eigenvalues"                                         
2  "$separate.analyses" "separate analyses for each group of variables"       
3  "$group"             "results for all the groups"                          
4  "$partial.axes"      "results for the partial axes"                        
5  "$inertia.ratio"     "inertia ratio"                                       
6  "$ind"               "results for the individuals"                         
7  "$quanti.var"        "results for the quantitative variables"              
8  "$quali.var"         "results for the categorical variables"               
9  "$quanti.var.sup"    "results for the quantitative supplementary variables"
10 "$summary.quanti"    "summary for the quantitative variables"              
11 "$summary.quali"     "summary for the categorical variables"               
12 "$global.pca"        "results for the global PCA"

And I don't know where the principal components are, all I can see are the eigenvalues using res$eig, I'm trying to reduce the dimensions of data but I'm heavily out of luck as I can't understand where to check for the eigenvectors(PC) or the components of original data along the PCs. Doing a ls(res$ind) gives me "coord", "cos2", "contrib", I can't make out what these are or even if I need these ...

You'll find the eigenvectors in `res$quanti.var$coord`, and the coordinates of qualitative variables in `res$quali.var$coord`. `contrib` gives the contribution of each modality to the construction of the axis (ie. the principal component). — scoa, Sep 09 '15 at 09:57
@scoa, why are the row names of res$quanti.var$coord labeled as the field names of qualitative vars and similar question for res$quali.var$coord. Also prcomp gives a rotation matrix that signify the components of each data row in the newer dimensions. There isn't any such entry here. all I can see are the above data frames. Thank you for a quick response — sandep, Sep 09 '15 at 11:13
Why don't you introduce dummy binary variable for categorical values. That is what most people would do if you wanted to perform pca — Gaurav, Sep 09 '15 at 11:33
I meant value, sorry about that. "Modality" is the french word. The row.names of `res$quanti.var$coord` should be the names of quantitative variables. If they are not, there might be a problem with your MFA call -- you should show a reproducible example. sa for the rotation matrix, the coordinates of individuals are in `res$ind$coord` — scoa, Sep 09 '15 at 11:34
@Gaurav Oh my god can I do that?? I kinda googled and found that it is not so much advisable, Are you sure it's good and reliable? And also if this is good enough can I introduce dummy variables for a set of categorical variables. I mean if there are two cat. vars can I group them and give dummy vars of the two cat. vars as a whole? — sandep, Sep 09 '15 at 11:42
Yes you can do that and it works well. You can have a dummy variable for each category no matter how many categories you have in each categorical variable. PCA will filter out redundant categories. — Gaurav, Sep 09 '15 at 11:48
@scoa , oops sorry that's a typo, res$quali.var$coord gives qualitative var rownames. But what does it signify as I'm getting new dimen values of all the categorical variable values, And also the res$ind$coord gives the contributions of entries in new dimension, but do these set of dimensions include new dimensions obtained from categorical variables too? — sandep, Sep 09 '15 at 11:50
That is how geometrical data analysis techniques handle categorical variables. See the literature on multiple correspondence analysis and `?MCA`. Basically, you could think of it as adding one dummy variable for each value of each categorical variables. Maybe you should ask your question on http://stats.stackexchange.com. — scoa, Sep 09 '15 at 11:59
@scoa I've a couple of questions 1)Can we just group all cat. variables into a group and other all continuous variables to other group and run MFA of FactoMineR 2)Does the dimensions resulted include the dimension obtained by both continuous and cat vars of just continuous vars? — sandep, Sep 09 '15 at 12:02
I am not sure about your first question. For the second, the dimensions are produced by both continuous and categorical variables. This is actually the very principle of MFA — scoa, Sep 10 '15 at 08:48

Principal Components for categorical Variables

0 Answers0