1

I am trying to save the coordinates created through a multiple correspondence analysis so that each observation has a set of coordinates for two types of categorical variables used in the MCA (so that I can calculate distances later on).

I can retrieve a list of values using the get_mca_var() command. However, I don't know how to match these to the corresponding observations. I essentially used two categorical variables in theMCA (multiple component analysis), so I think I would have to create 4 new variables: two with the coordinates of variable 1 and the other two with the coordinates of variable 2. Can anybody help me out with this?

I'm using FactoMineR and factoextra packages for this.

Below is a sample of my data.

The law_id variable identifies a lawyer, the class_section_id variable identifies the type of case she was working on, and the firmsize_numpatvariable identifies the size of the law firm (no label). I then run the MCA using the following command, leaving the firmsize as supplementary:

res.MCA <- MCA(sample.active, quali.sup = 3, ncp = 2, method = "Burt")

I essentially want to do two things now:

  1. save the coordinates as new variables so that I can use them later to calculate the the distance from lawyers to the category of cases. I know that I can retrieve the coordinates using res.MCA$var$coord or get_mca_var(res.MCA), but I don't know how to save them as a new variable (my background, unfortunately, is in Stata, so every little thing in R is still a struggle for me).
  2. I want to plot the lawyers and categories (distinctly visible) and color code the lawyers by the category of firmsize_numpat.

I've tried a bunch of different things using

fviz_mca_var(res.MCA, repel = TRUE, ggtheme = theme_minimal(), 
             geom = c("point", "text")) 

and the habillage option, but that one only seems to work for individual observations and does not let me color code the law_id variable according to another variable (here: firmsize_numput).

structure(list(law_id = structure(c(19L, 12L, 22L, 20L, 26L, 
4L, 7L, 28L, 10L, 14L, 2L, 18L, 24L, 24L, 17L, 9L, 28L, 7L, 28L, 
21L, 23L, 8L, 24L, 15L, 24L, 6L, 9L, 1L, 17L, 4L, 23L, 24L, 4L, 
10L, 25L, 13L, 24L, 22L, 9L, 11L, 16L, 8L, 24L, 3L, 9L, 5L, 23L, 
27L, 25L, 17L), .Label = c("1604", "1898", "2181", "3428", "4795", 
"5507", "5953", "6269", "6744", "8368", "8759", "9999", "10265", 
"11235", "12622", "12833", "13489", "15744", "16595", "20200", 
"20728", "20731", "22433", "23876", "23926", "24150", "24935", 
"26241"), class = "factor"), class_section_id = structure(c(2L, 
1L, 3L, 5L, 3L, 1L, 2L, 7L, 6L, 5L, 6L, 6L, 1L, 5L, 3L, 2L, 1L, 
5L, 1L, 2L, 5L, 6L, 6L, 5L, 6L, 6L, 2L, 6L, 3L, 2L, 2L, 5L, 4L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 2L, 5L, 6L, 3L, 7L, 1L, 3L, 5L, 5L, 
7L), .Label = c("1", "2", "3", "6", "7", "8", "9"), class = "factor"), 
    firmsize_numpat = structure(c(1L, 4L, 3L, 1L, 2L, 3L, 5L, 
    4L, 5L, 1L, 5L, 1L, 5L, 5L, 4L, 5L, 4L, 5L, 4L, 3L, 3L, 3L, 
    5L, 2L, 5L, 2L, 5L, 1L, 4L, 3L, 3L, 5L, 3L, 5L, 2L, 1L, 5L, 
    3L, 5L, 2L, 1L, 3L, 5L, 1L, 5L, 4L, 3L, 4L, 2L, 4L), .Label = c("0", 
    "1", "2", "3", "4"), class = "factor")), row.names = c(123895L, 
71155L, 152220L, 148739L, 175015L, 24338L, 43379L, 192748L, 60320L, 
82138L, 11576L, 118608L, 172718L, 172873L, 98145L, 49021L, 192841L, 
43502L, 192770L, 152160L, 163562L, 45490L, 172825L, 92072L, 172765L, 
38913L, 49067L, 9823L, 98123L, 24386L, 163580L, 172887L, 24383L, 
60235L, 173440L, 73281L, 172708L, 152224L, 49003L, 62174L, 94485L, 
45527L, 172775L, 13238L, 49211L, 34276L, 163557L, 181681L, 173435L, 
98126L), class = "data.frame")
camille
  • 16,432
  • 18
  • 38
  • 60
GLump
  • 31
  • 5
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Dec 14 '18 at 17:05
  • looks like you might want to use `match` but one can't tell before you provide more detail – Chris Ruehlemann Dec 14 '18 at 17:22
  • Sorry you two, I tried to be a bit more specific and provide a data example. Thanks for letting me know – GLump Dec 14 '18 at 17:51
  • Looking at the docs, `habillage` is an argument to `fviz_mca_ind`, not `fviz_mca_var` – camille Dec 14 '18 at 18:40

0 Answers0