specifying probability weights in R without using Lumley survey package

Question

I would really appreciate any help with specifying probability weights in R without using the Lumley survey package. I am conducting mediation analysis in R using the Imai et al mediation package, which does not currently support svyglm.

The code I am currently running is:

olsmediator_basic<-lm(poledu ~ gateway_strict_alt + gender_n + spline1 + spline2 + spline3,
   data = unifiedanalysis, weights = designweight).

However, I'm unsure if this is weighting the data correctly. The reason is that this code yields standard errors that differ from those I am getting in Stata. The Stata code I am running is:

reg poledu gateway_strict_alt gender_n spline1 spline2 spline3 [pweight=designweight]).

I was wondering if the weights option in R may not be for inverse probability weights, but I was unable to determine this from the documentation, this forum or elsewhere. If I am missing something, I really apologize - I am new to R as well as to this forum.

Thank you in advance for your help.

This would be easier if you gave a reproducible example. What values are in `designweight`? the survey probabilities or the or the inverse probabilities. If you wish to weight by the inverse probabilities, you have to give these as the weights. — mnel, Jun 15 '12 at 00:23
Thanks for responding mnel and sorry about that. The values in designweight are the inverse probability weights. Anything I should be doing differently? — sabaya, Jun 15 '12 at 00:35
Here is a link to a guide to asking good questions on Stackoverflow: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example By following the suggestions therein you might increase your chances of receiving a helpful reply. A reproducible example would help, as would including the output from R and Stata when running that reproducible example. — Mark Miller, Jun 15 '12 at 08:51

score 2 · Answer 1 · answered Jun 15 '12 at 23:58

The R documentation specifies that the weights parameter of the lm function is inversely proportional to the variance of the observations. This is the definition of analytic weights, or aweights in Stata.

Have a look at the ipw package for inverse probability weighting.

score -2 · Answer 2 · answered Feb 23 '17 at 20:26

To correct a previous answer - I looked up the manual on weights and found the following description for weights in lm

Non-NULL weights can be used to indicate that different observations have different variances (with the values in weights being inversely proportional to the variances); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations (including the case that there are w_i observations equal to y_i and the data have been summarized).

These are actually frequency weights (fweights in stata). They multiply out the observation n number of times as defined by the weight vector. Probability weights, on the other hand, refer to the probability that observations group is included in the population. Doing so adjusts the impact of the observation on the coefficients, but not on the standard errors, as they don't change the number of observations represented in the sample.

I know this is nearly 5 years old when I was answering it, but once I stumbled on it the whole office got into a weighting discussion — EconomySizeAl, Feb 23 '17 at 20:27
This is incorrect. `lm` and `glm` actually implement analytical weights (`aweight` in Stata), also known as inverse-variance weights (which matches the description you give). See for example [this link](http://bc.bojanorama.pl/2015/09/linear-models-with-weighted-observations/). — Milan Bouchet-Valat, May 02 '17 at 13:36

specifying probability weights in R *without* using Lumley survey package

2 Answers2

specifying probability weights in R without using Lumley survey package