0

I'm trying to replicate some functions from Stata in R, but I'm really really stuck with the e(sample) function after doing a multiple correspondence analysis (mca).

In Stata the code is this:

    clear

    set obs 10
    gen var1 = cond(_n <= 2, 0, 1)
    gen var2 = cond(_n == 1, 0, 1) 
    gen var3 = var2     

    mca var1 var2 var3, method(burt)
    predict var4 if e(sample)

The last command generates predicted values only for the observations used by mca.

In R, I have been doing this to do mca:

    if(!require("FactoMineR")) {
    install.packages("FactoMineR")
    library("FactoMineR") 
    }


    if(!require("factoextra")) {
    install.packages("factoextra")
    library("factoextra")
    }

    var1 <- c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1)
    var2 <- c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1)
    var3 <- c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1)

    df <- data.frame(var1, var2, var3)


    df$var1 <- as.factor(df$var1)
    df$var2 <- as.factor(df$var2)
    df$var3 <- as.factor(df$var3)

    mca4 <- MCA(df, ncp = 2, method = "Burt")
    mca4$call$marge.col

And I get the same results from the mca process as in Stata, but I've not been able to replicate the last line from the Stata code predict var4 if e(sample), I already tried with predict.mca but it doesn't work at all: it gives me values from the dimensions specified in ncp = 2, so I guess it doesn't do the same as the predict command from Stata.

The results from Stata:

mca var1 var2 var3, method(burt)

Statistics for column categories in standard normalization

             |          Overall          |        Dimension_1        
  Categories |    Mass  Quality   %inert |   Coord   Sqcorr  Contrib 
-------------+---------------------------+---------------------------
var1         |                           |                           
           0 |   0.067    1.101    0.188 |   1.786    1.101    0.213 
           1 |   0.267    1.101    0.047 |  -0.446    1.101    0.053 
-------------+---------------------------+---------------------------
var2         |                           |                           
           0 |   0.033    0.936    0.344 |   3.148    0.936    0.330 
           1 |   0.300    0.936    0.038 |  -0.350    0.936    0.037 
-------------+---------------------------+---------------------------
var3         |                           |                           
           0 |   0.033    0.936    0.344 |   3.148    0.936    0.330 
           1 |   0.300    0.936    0.038 |  -0.350    0.936    0.037 
---------------------------------------------------------------------

predict var4 if e(sample)

The results of the predict command:

var4
2.912461
.3913612
-.4129778
-.4129778
-.4129778
-.4129778
-.4129778
-.4129778
-.4129778
-.4129778
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
komh13
  • 1
  • 1
  • 1
    R people might need the explanation that `e(sample)` is 1 if an observation was used in the last model fit and 0 otherwise. The "otherwise" might be that observations were automatically excluded because of missing values or that observations were deliberately excluded for whatever other reason(s). – Nick Cox Nov 28 '22 at 19:17
  • Just possibly -- I am no kind of R expert -- the answer is that there is **no equivalent**. R is not nearly so focused on the idea of a single dataset in memory (although Stata now has frames too). – Nick Cox Nov 28 '22 at 19:18
  • Guess you're after `mca4$svd$U[,1]`. The predict command in R will typically return predictions based on unseen data + In this case there seems to be no need for an e(sample) equivalent. – harre Nov 28 '22 at 20:07
  • There isn't really an `e(sample)` equivalent in R, though in some cases (particularly with the modelling functions, like `lm()` and `glm()`, etc...), you can use `model.frame(object)` to return the data used to fit the model, or something like `na.omit(get_all_vars(object, data))` which will grab all the data used to fit the model and then listwise delete it. Neither of these works with `MCA()` from `FactoMineR`, but since the input to the function is a data frame, you could list wise delete it yourself and obviate the need for something like `e(sample)`. – DaveArmstrong Dec 16 '22 at 00:47

0 Answers0