I am trying to save the coordinates created through a multiple correspondence analysis so that each observation has a set of coordinates for two types of categorical variables used in the MCA (so that I can calculate distances later on).
I can retrieve a list of values using the get_mca_var()
command. However, I don't know how to match these to the corresponding observations. I essentially used two categorical variables in theMCA
(multiple component analysis), so I think I would have to create 4 new variables: two with the coordinates of variable 1
and the other two with the coordinates of variable 2
. Can anybody help me out with this?
I'm using FactoMineR
and factoextra
packages for this.
Below is a sample of my data.
The law_id
variable identifies a lawyer, the class_section_id
variable identifies the type of case she was working on, and the firmsize_numpat
variable identifies the size of the law firm (no label). I then run the MCA using the following command, leaving the firmsize as supplementary:
res.MCA <- MCA(sample.active, quali.sup = 3, ncp = 2, method = "Burt")
I essentially want to do two things now:
- save the coordinates as new variables so that I can use them later to calculate the the distance from lawyers to the category of cases. I know that I can retrieve the coordinates using
res.MCA$var$coord
orget_mca_var(res.MCA)
, but I don't know how to save them as a new variable (my background, unfortunately, is in Stata, so every little thing in R is still a struggle for me). - I want to plot the lawyers and categories (distinctly visible) and color code the lawyers by the category of
firmsize_numpat
.
I've tried a bunch of different things using
fviz_mca_var(res.MCA, repel = TRUE, ggtheme = theme_minimal(),
geom = c("point", "text"))
and the habillage
option, but that one only seems to work for individual observations and does not let me color code the law_id
variable according to another variable (here: firmsize_numput
).
structure(list(law_id = structure(c(19L, 12L, 22L, 20L, 26L,
4L, 7L, 28L, 10L, 14L, 2L, 18L, 24L, 24L, 17L, 9L, 28L, 7L, 28L,
21L, 23L, 8L, 24L, 15L, 24L, 6L, 9L, 1L, 17L, 4L, 23L, 24L, 4L,
10L, 25L, 13L, 24L, 22L, 9L, 11L, 16L, 8L, 24L, 3L, 9L, 5L, 23L,
27L, 25L, 17L), .Label = c("1604", "1898", "2181", "3428", "4795",
"5507", "5953", "6269", "6744", "8368", "8759", "9999", "10265",
"11235", "12622", "12833", "13489", "15744", "16595", "20200",
"20728", "20731", "22433", "23876", "23926", "24150", "24935",
"26241"), class = "factor"), class_section_id = structure(c(2L,
1L, 3L, 5L, 3L, 1L, 2L, 7L, 6L, 5L, 6L, 6L, 1L, 5L, 3L, 2L, 1L,
5L, 1L, 2L, 5L, 6L, 6L, 5L, 6L, 6L, 2L, 6L, 3L, 2L, 2L, 5L, 4L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 2L, 5L, 6L, 3L, 7L, 1L, 3L, 5L, 5L,
7L), .Label = c("1", "2", "3", "6", "7", "8", "9"), class = "factor"),
firmsize_numpat = structure(c(1L, 4L, 3L, 1L, 2L, 3L, 5L,
4L, 5L, 1L, 5L, 1L, 5L, 5L, 4L, 5L, 4L, 5L, 4L, 3L, 3L, 3L,
5L, 2L, 5L, 2L, 5L, 1L, 4L, 3L, 3L, 5L, 3L, 5L, 2L, 1L, 5L,
3L, 5L, 2L, 1L, 3L, 5L, 1L, 5L, 4L, 3L, 4L, 2L, 4L), .Label = c("0",
"1", "2", "3", "4"), class = "factor")), row.names = c(123895L,
71155L, 152220L, 148739L, 175015L, 24338L, 43379L, 192748L, 60320L,
82138L, 11576L, 118608L, 172718L, 172873L, 98145L, 49021L, 192841L,
43502L, 192770L, 152160L, 163562L, 45490L, 172825L, 92072L, 172765L,
38913L, 49067L, 9823L, 98123L, 24386L, 163580L, 172887L, 24383L,
60235L, 173440L, 73281L, 172708L, 152224L, 49003L, 62174L, 94485L,
45527L, 172775L, 13238L, 49211L, 34276L, 163557L, 181681L, 173435L,
98126L), class = "data.frame")