What is the equivalent of e(sample) from Stata in R?

Question

I'm trying to replicate some functions from Stata in R, but I'm really really stuck with the e(sample) function after doing a multiple correspondence analysis (mca).

In Stata the code is this:

    clear

    set obs 10
    gen var1 = cond(_n <= 2, 0, 1)
    gen var2 = cond(_n == 1, 0, 1) 
    gen var3 = var2     

    mca var1 var2 var3, method(burt)
    predict var4 if e(sample)

The last command generates predicted values only for the observations used by mca.

In R, I have been doing this to do mca:

    if(!require("FactoMineR")) {
    install.packages("FactoMineR")
    library("FactoMineR") 
    }


    if(!require("factoextra")) {
    install.packages("factoextra")
    library("factoextra")
    }

    var1 <- c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1)
    var2 <- c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1)
    var3 <- c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1)

    df <- data.frame(var1, var2, var3)


    df$var1 <- as.factor(df$var1)
    df$var2 <- as.factor(df$var2)
    df$var3 <- as.factor(df$var3)

    mca4 <- MCA(df, ncp = 2, method = "Burt")
    mca4$call$marge.col

And I get the same results from the mca process as in Stata, but I've not been able to replicate the last line from the Stata code predict var4 if e(sample), I already tried with predict.mca but it doesn't work at all: it gives me values from the dimensions specified in ncp = 2, so I guess it doesn't do the same as the predict command from Stata.

The results from Stata:

mca var1 var2 var3, method(burt)

Statistics for column categories in standard normalization

             |          Overall          |        Dimension_1        
  Categories |    Mass  Quality   %inert |   Coord   Sqcorr  Contrib 
-------------+---------------------------+---------------------------
var1         |                           |                           
           0 |   0.067    1.101    0.188 |   1.786    1.101    0.213 
           1 |   0.267    1.101    0.047 |  -0.446    1.101    0.053 
-------------+---------------------------+---------------------------
var2         |                           |                           
           0 |   0.033    0.936    0.344 |   3.148    0.936    0.330 
           1 |   0.300    0.936    0.038 |  -0.350    0.936    0.037 
-------------+---------------------------+---------------------------
var3         |                           |                           
           0 |   0.033    0.936    0.344 |   3.148    0.936    0.330 
           1 |   0.300    0.936    0.038 |  -0.350    0.936    0.037 
---------------------------------------------------------------------

predict var4 if e(sample)

The results of the predict command:

var4
2.912461
.3913612
-.4129778
-.4129778
-.4129778
-.4129778
-.4129778
-.4129778
-.4129778
-.4129778

R people might need the explanation that `e(sample)` is 1 if an observation was used in the last model fit and 0 otherwise. The "otherwise" might be that observations were automatically excluded because of missing values or that observations were deliberately excluded for whatever other reason(s). — Nick Cox, Nov 28 '22 at 19:17
Just possibly -- I am no kind of R expert -- the answer is that there is **no equivalent**. R is not nearly so focused on the idea of a single dataset in memory (although Stata now has frames too). — Nick Cox, Nov 28 '22 at 19:18
Guess you're after `mca4$svd$U[,1]`. The predict command in R will typically return predictions based on unseen data + In this case there seems to be no need for an e(sample) equivalent. — harre, Nov 28 '22 at 20:07
There isn't really an `e(sample)` equivalent in R, though in some cases (particularly with the modelling functions, like `lm()` and `glm()`, etc...), you can use `model.frame(object)` to return the data used to fit the model, or something like `na.omit(get_all_vars(object, data))` which will grab all the data used to fit the model and then listwise delete it. Neither of these works with `MCA()` from `FactoMineR`, but since the input to the function is a data frame, you could list wise delete it yourself and obviate the need for something like `e(sample)`. — DaveArmstrong, Dec 16 '22 at 00:47

What is the equivalent of e(sample) from Stata in R?

0 Answers0