35

I would like to estimate covariate effects on a response whose values take on values in [0,1]. That is, the values of the response variable live between 0-1 (inclusive). I would like to use the fractional logit model described by Papke and Wooldridge (1996), see below:

http://faculty.smu.edu/millimet/classes/eco6375/papers/papke%20wooldridge%201996.pdf

Is there an R function (or library) to facilitate estimation of the fractional logit model? Could I modify glm() in some way?

Edited question starts here

I appreciate @Jibler's comment - this gets at the estimated beta's from the fractional logit model fine. However, as @Ben pointed out, the SE's won't be correctly estimated given this specification.

I suppose this is a more popular model in economics, hence is well discussed by STATA journal contributors: http://fmwww.bc.edu/EC-C/S2013/823/EC823.S2013.nn06.slides.pdf http://www.stata.com/meeting/germany10/germany10_buis.pdf

I was able to obtain the data from the Papke and Wooldridge 401k plan example (see below). It appears to me at least that the robustness in the fractional logit model is obtained by the sandwich estimator of variance - equation (9) of Papke and Wooldridge. That said, equation (10) goes on to demonstrate how robustness may also be obtained by pre-multiplying the estimated vcov matrix from a standard glm(...,family=binomial(link=logit)) fit by an estimate of the Pearson residuals.

The slides by Buis seem to implement a sandwich() form of the fractional logit estimator using the argument vce(robust). These align exactly with the application of the sandwich() function in R, to the standard binomial GLM. I assume, but am not sure, as I'm not a STATA wiz, that this is the same as Baum's argument to simply robust? If anyone owns STATA and could check that would be helpful. The model given by the family=quasibinomial GLM gives very slightly different SE estimates. But it too seems to be a reasonable estimator of both the mean/variance parameters of the fractional logit model.

Below is some R code which replicates the data fit given in the Buis article above (it also shows how the quasi-binomial model gives slightly different SE estimates):

##
## Replicate what some STATA Journal editors call "fractional logit"
## get data from: "http://fmwww.bc.edu/repec/bocode/k/k401.dta" 
##
library(sandwich)
library(foreign)

X <- read.dta("F:/ProportionsDepVar/k401.dta")
class(X)
names(X)
dim(X)
X$totemp1 <- X$totemp/10000

glmfit <- glm(prate ~ mrate + totemp1 + age + sole, family=binomial(link=logit), data=X)
summary(glmfit)

##
## And the SE's are off here and biased large
## Use sandwich estimator instead
##
sand_vcov <- sandwich(glmfit)
sand_se <- sqrt(diag(sand_vcov))
robust_z <- glmfit$coef/sand_se
robust_z

##
## Quasi binomial fit is close to replicating SE's
##
flogit1 <- glm(prate ~ mrate + totemp1 + age + sole, family=quasibinomial(link=logit), data=X)
summary(flogit1)

So...thanks @Ben for useful suggestions. My take is that either family=quasibinomial or sandwich library does a good job at estimating robust SE's for fractional logit model in R (as defined by equations (9) or (10) of Papke and Wooldridge). Appreciate comments/criticisms if this conclusion is not true.

user227710
  • 3,164
  • 18
  • 35
Chris
  • 3,401
  • 5
  • 33
  • 42
  • From [these instructions for Stata](http://www.econ.msu.edu/faculty/papke/Papke_Wooldridge1996flogitinstructions.pdf) (in [Prof. Papke web](http://www.econ.msu.edu/faculty/papke/)) I think you could use something like this: `glm(Y ~ X1 + X2, data = your.df, family = binomial(link = "logit"))` – Jilber Urbina Nov 10 '13 at 18:11
  • 3
    @Jilber, I don't think so. The second form given there (with an estimated scale parameter) would be `family=quasibinomial`. The first would involve some form of robust estimation; I haven't looked at Papke and Woolridge carefully enough to know exactly how the robustness is incorporated. You can get robust SE estimation via the `sandwich` package; you can do robust GLM fitting via the `robust` package ... – Ben Bolker Nov 10 '13 at 18:15
  • 4
    this is almost a "can I find code" question rather than a "here's my programming problem"; I'm interested and sympathetic but tempted to close it. It might be appropriate for StackExchange if rephrased as "what is the relationship between Papke and Woolridge's suggested approach and the various available tools in R (`quasi-` families, `sandwich` package, `robust` package) ... ? – Ben Bolker Nov 10 '13 at 18:20
  • also possibly useful: http://www.stata.com/statalist/archive/2005-11/msg00502.html – Ben Bolker Nov 10 '13 at 18:27
  • Thanks for useful suggestions @Ben. I think either `quasi-` or `sandwich` work for easy implementation of fractional logit in R. Btw...your `bbmle` package/vignettes were cool reads about similar problems. – Chris Nov 11 '13 at 03:13
  • 2
    You may get better results on [CrossValidated](http://http://stats.stackexchange.com), but this question is apparently too old to be migrated there. – smci Jul 10 '15 at 01:34
  • Can we use family=binomial if we know that the distribution of dependent variable does not follow binomial distribution? My data has dependent variable that very likely follows beta distribution. How do i fit a fractional response model in R? – oivemaria Aug 19 '15 at 20:48
  • `quasibinomial` relaxes the mean variance relationship that is implied in a binomial regression. By using `quasibinomial you allow for overdispersion. If you are interested in the working on the log odds scale, the `sandwitch` estimates with `family = binomail()` would be way I would go. Or, work on the log-risk scale by using `family = poisson()` and a `sandwitch` estimator to account for model misspecification. [Check this article](http://aje.oxfordjournals.org/content/159/7/702.short) for a reference. – Peter Nov 03 '15 at 04:20

0 Answers0