Fractional Response Regression in R

Question

I am trying to model my data in which the response variable is between 0 and 1, so I have decided to use fractional response model in R. From my current understanding, the fractional response model is similar to logistic regression, but it uses qausi-likelihood method to determine parameters. I am not sure I understand it correctly.

So far what I have tried is the frm from package frm and glm on the following data, which is the same as this OP

library(foreign)
mydata <- read.dta("k401.dta")

Further, I followed the procedures in this OP in which glm is used. However, with the same dataset with frm, it returns different SE

library(frm)
y <- mydata$prate
x <- mydata[,c('mrate', 'age', 'sole', 'totemp1')]
myfrm <- frm(y, x, linkfrac = 'logit')

frm returns,

*** Fractional logit regression model ***

           Estimate Std. Error t value Pr(>|t|)    
INTERCEPT  1.074062   0.048902  21.963    0.000 ***
mrate      0.573443   0.079917   7.175    0.000 ***
age        0.030895   0.002788  11.082    0.000 ***
sole       0.363596   0.047595   7.639    0.000 ***
totemp1   -0.057799   0.011466  -5.041    0.000 ***

Note: robust standard errors

Number of observations: 4734 
R-squared: 0.124

With glm, I use

myglm <- glm(prate ~ mrate + totemp1 + age + sole, data = mydata, family = quasibinomial('logit'))
summary(myglm)

Call:
glm(formula = prate ~ mrate + totemp1 + age + sole, family = quasibinomial("logit"), 
    data = mydata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.1214  -0.1979   0.2059   0.4486   0.9146  

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.074062   0.047875  22.435  < 2e-16 ***
mrate        0.573443   0.048642  11.789  < 2e-16 ***
totemp1     -0.057799   0.011912  -4.852 1.26e-06 ***
age          0.030895   0.003148   9.814  < 2e-16 ***
sole         0.363596   0.051233   7.097 1.46e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 0.2913876)

    Null deviance: 1166.6  on 4733  degrees of freedom
Residual deviance: 1023.7  on 4729  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 6

Which one should I rely on? Is it better to use glm instead of frm since I have seen the OP that SE estimated could be different

score 11 · Accepted Answer · edited May 23 '17 at 12:31

The differences in the two approaches stem from different degree of freedom corrections in the computation of the robust standard errors. Using similar defaults, the results will be identical. See the following example:

library(foreign)
library(frm)
library(sandwich)
library(lmtest)

df <- read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta")
df$prate <- df$prate/100

y <- df$prate
x <- df[,c('mrate', 'age', 'sole', 'totemp')]

myfrm <- frm(y, x, linkfrac = 'logit')

*** Fractional logit regression model ***

           Estimate Std. Error t value Pr(>|t|)    
INTERCEPT  0.931699   0.084077  11.081    0.000 ***
mrate      0.952872   0.137079   6.951    0.000 ***
age        0.027934   0.004879   5.726    0.000 ***
sole       0.340332   0.080658   4.219    0.000 ***
totemp    -0.000008   0.000003  -2.701    0.007 ***

Now the GLM:

myglm <- glm(prate ~ mrate + totemp + age + sole, 
             data = df, family = quasibinomial('logit'))
coeftest(myglm, vcov.=vcovHC(myglm, type="HC0"))

z test of coefficients:

                 Estimate    Std. Error z value              Pr(>|z|)    
(Intercept)  0.9316994257  0.0840772572 11.0815 < 0.00000000000000022 ***
mrate        0.9528723652  0.1370808798  6.9512     0.000000000003623 ***
totemp      -0.0000082352  0.0000030489 -2.7011              0.006912 ** 
age          0.0279338963  0.0048785491  5.7259     0.000000010291017 ***
sole         0.3403324262  0.0806576852  4.2195     0.000024488075931 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

With HC0, the standard errors are identical. That is, frm uses HC0 by default. See this post for an extensive discussion. The defaults used by sandwich are probably better in some situations, though I would suspect that it does not matter much in general. You can see this already from your results: the differences are numerically very small.

score 1 · Answer 2 · answered Jun 02 '16 at 07:01

1

You need to divide the prate variable by 100. You might also have to upgrade your version of frm.

answered Jun 02 '16 at 07:01

tchakravarty

10,736
12
72
116

Fractional Response Regression in R

2 Answers2

Linked