I am unable to replicate in R a particular use case of the Stata margins
command:
margins var1, over(var2)
I've been trying to do so using the margins
package in R.
To provide a reproducible example, I used the mtcars dataset and exported it from R into Stata so we are using the same dataset in both programs:
R code:
library(foreign)
library(margins)
write.dta(mtcars, “mtcars.dta")
Stata code:
use "mtcars.dta", clear
Create an example linear regression model in both programs
Stata code:
quietly regress mpg cyl i.am c.wt##c.hp
R code:
x <- lm(mpg ~ cyl + factor(am) + hp * wt, data = mtcars)
The model output (not shown) is identical between the two programs
Compare the average marginal effects table for each variable in the model
Stata code and output:
margins, dydx(*)
Average marginal effects Number of obs = 32
Model VCE: OLS
Expression : Linear prediction, predict() dy/dx w.r.t. : cyl 1.am wt hp
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cyl | -.3708001 .5293674 -0.70 0.490 -1.45893 .7173301
1.am | -.0709546 1.374981 -0.05 0.959 -2.897268 2.755359
wt | -3.868994 .9170145 -4.22 0.000 -5.753944 -1.984043
hp | -.0249882 .0120345 -2.08 0.048 -.0497254 -.000251
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
R code and output:
xmarg <- margins(x)
summary(xmarg)
factor AME SE z p lower upper
am1 -0.0710 1.3750 -0.0516 0.9588 -2.7659 2.6240
cyl -0.3708 0.5294 -0.7005 0.4836 -1.4083 0.6667
hp -0.0250 0.0120 -2.0764 0.0379 -0.0486 -0.0014
wt -3.8690 0.9170 -4.2191 0.0000 -5.6663 -2.0717
As you can see, these two outputs are very similar to one another, as expected using the R margins
package.
Problem 1: Marginal predictions OVER the value of a variable
Stata Code and Output:
margins, over(cyl)
Predictive margins Number of obs = 32
Model VCE: OLS
Expression : Linear prediction, predict()
over : cyl
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cyl |
4 | 26.56699 .6390379 41.57 0.000 25.25342 27.88055
6 | 20.04662 .5797511 34.58 0.000 18.85492 21.23831
8 | 15.02406 .5718886 26.27 0.000 13.84853 16.19959
------------------------------------------------------------------------------
R Code and Output:
aggregate(fitted~cyl, data = xmarg, FUN = mean)
cyl fitted
1 4 26.56699
2 6 20.04662
3 8 15.02406
In the two examples above, the marginal prediction is identical between R and Stata. However, is there a way (short of doing it by hand) to generate the delta-method standard error for each marginal prediction as is done in the Stata table above?
Problem 2: Marginal predictions for a specific variable:
Stata Code and Output:
margins am
Predictive margins Number of obs = 32
Model VCE : OLS
Expression : Linear prediction, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
am |
0 | 20.11945 .6819407 29.50 0.000 18.7177 21.5212
1 | 20.0485 .9052764 22.15 0.000 18.18767 21.90932
------------------------------------------------------------------------------
R Code and Output:
aggregate(fitted~am, data = xmarg, FUN = mean)
am fitted
1 0 17.14737
2 1 24.39231
In this example, we are trying to replicate Stata’s “marginlist” argument in the margins
command by subsetting the dataset after prediction. This does not seem to be the right way. How can we replicate these results from Stata?
Problem 3: Marginal prediction of one variable over the value of another
Replicating this result is my main goal!
Stata Code and output
margins am, over(cyl)
Predictive margins Number of obs = 32
Model VCE : OLS
Expression : Linear prediction, predict()
over : cyl
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cyl#am |
4 0 | 26.61859 1.246074 21.36 0.000 24.05725 29.17993
4 1 | 26.54763 .7034599 37.74 0.000 25.10165 27.99362
6 0 | 20.07703 .6449805 31.13 0.000 18.75125 21.4028
6 1 | 20.00607 1.144518 17.48 0.000 17.65348 22.35866
8 0 | 15.0342 .6228319 24.14 0.000 13.75395 16.31445
8 1 | 14.96324 1.257922 11.90 0.000 12.37754 17.54894
------------------------------------------------------------------------------
R Code and Output:
aggregate(fitted ~ am + cyl, data = xmarg, FUN = mean)
am cyl fitted
1 0 4 22.83306
2 1 4 27.96721
3 0 6 19.06359
4 1 6 21.35732
5 0 8 15.08720
6 1 8 14.64519
As you can see, the point estimates are now substantially different and again there is no SE table. Solving Problem 1 and Problem 2 above will likely allow the solution to Problem 3.