3

I'm interested in comparing estimates from different quantiles (same outcome, same covariates) using anova.rqlist function called by anova in the environment of the quantreg package in R. However the math in the function is beyond my rudimentary expertise. Lets say i fit 3 models at different quantiles;

library(quantreg)
data(Mammals) # data in quantreg to be used as a useful example
fit1 <- rq(weight ~ speed + hoppers + specials, tau = .25, data = Mammals)
fit2 <- rq(weight ~ speed + hoppers + specials, tau = .5, data = Mammals)
fit3 <- rq(weight ~ speed + hoppers + specials, tau = .75, data = Mammals)

Then i compare them using;

anova(fit1, fit2, fit3, test="Wald", joint=FALSE)

My question is which is of these models is being used as the basis of the comparison?

My understanding of the Wald test (wiki entry)

enter image description here

where θ^ is the estimate of the parameter(s) of interest θ that is compared with the proposed value θ0.

So my question is what is the anova function in quantreg choosing as the θ0?

Based on the pvalue returned from the anova my best guess is that it is choosing the lowest quantile specified (ie tau=0.25). Is there a way to specify the median (tau = 0.5) or better yet the mean estimate from obtained using lm(y ~ x1 + x2 + x3, data)?

anova(fit1, fit2, fit3, joint=FALSE)

actually produces

Quantile Regression Analysis of Deviance Table

Model: weight ~ speed + hoppers + specials
Tests of Equality of Distinct Slopes: tau in {  0.25 0.5 0.75  }

             Df Resid Df F value  Pr(>F)  
speed         2      319  1.0379 0.35539  
hoppersTRUE   2      319  4.4161 0.01283 *
specialsTRUE  2      319  1.7290 0.17911  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

while

anova(fit3, fit1, fit2, joint=FALSE)

produces the exact same result

Quantile Regression Analysis of Deviance Table

Model: weight ~ speed + hoppers + specials
Tests of Equality of Distinct Slopes: tau in {  0.5 0.25 0.75  }

             Df Resid Df F value  Pr(>F)  
speed         2      319  1.0379 0.35539  
hoppersTRUE   2      319  4.4161 0.01283 *
specialsTRUE  2      319  1.7290 0.17911  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The order of the models is clearly being changed in the anova, but how is it that the F value and Pr(>F) are identical in both tests?

lmo
  • 37,904
  • 9
  • 56
  • 69
  • Note this question is partly motivated by my reading of the book(let) "Quantile Regression" by Lingxin Hao and Daniel Q. Naiman. On pages 49,50 and 60 they detail how bootstrapped quantile estimates can be compared across quantiles using Wald test. They expressly describe how to a quantile of interest can be compared to median or lowest or nearest quantiles, which got me thinking about what the anova function in the quantreg package is doing for non bootstrapped estimates and what quantile is chosen as the reference for such comparisons. – JustGettinStarted Sep 23 '15 at 15:52

1 Answers1

2

All the quantiles you input are used and there is not one model used as a reference.

I suggest you read this post and the related answer to understand what your "theta.0" is.

I believe what you are trying to do is to test whether the regression lines are parallel. In other words whether the effects of the predictor variables (only income here) are uniform across quantiles.

You can use the anova() from the quantreg package to answer this question. You should indeed use several fits for each quantile.

When you use joint=FALSE as you did, you get coefficient-wise comparisons. But you only have one coefficient so there is only one line! And your results tells you that the effect of income is not uniform accross quantiles in your example. Use several predictor variables and you will get several p-values.

You can do an overall test of equality of the entire sets of coefficients if you do not use joint=FALSE and that would give you a "Joint Test of Equality of Slopes" and therefore only one p-value.

EDIT:

I think theta.0 is the average slope for all 'tau' values or the actual estimate from 'lm()', rather than a specific slope of any of the models. My reasoning is that 'anova.rq()' does not require any specific low value of 'tau' or even the median 'tau'.

There are several ways to test this. Either do the calculations by hand with theta.0 being equal to the average value, or compare many combinations because then you could a situation where certain of your models are close to the model with a low 'tau' values but not to the 'lm()' value. So if theta.0 is the slope of the first model with lowest 'tau' then your Pr(>F) will be high whereas in the other case, it will be low.

This question should maybe have been asked on cross-validated.

Community
  • 1
  • 1
Julian Wittische
  • 1,219
  • 14
  • 22
  • 1) The post you link was informative: it states "Our parameter of interest is denoted θ0 and this is usually 0 as we want to test whether the coefficient differs from 0 or not". I am not interested in the coefficients differ from 0 (the pvalue in `summary.rq(fit1)` will tell me that) What im trying to test is whether the effects of the predictors at e.g tau=0.1 and tau=0.9 are significantly different from tau=0.5. In base `anova` the first model specified in the argument is used as θ0, why is this not the case with `anova.rq` and how can i make it so? – JustGettinStarted Sep 24 '15 at 15:00
  • 2) adjusted example to include multiple covariates so `joint=FALSE` can be applied. For my application i am interested in how the effects of one specific predictor varies at different quantiles. Even including model with multiple covariates, changing the order of the models in the arguments for `anova.rq` does not change the outcome. – JustGettinStarted Sep 24 '15 at 15:04
  • You can already tell that the effects of the predictor is different for different tau values. If you want information about particular 'tau' values then you can always use the original 'anova()' function I guess. – Julian Wittische Sep 25 '15 at 03:03
  • In response to your edit. I suspect actually that `anova.rq` uses all possible combinations of θ.hat and θ0 to test whether θhat - θ0 = 0 and NOT if θhat = θ0. In this way there is no reference (θ0) per se, because every tau is tested against every other tau, in all combinations. Im guessing this from Koenker's book "Quantile Regression" 2005 pg 75-76, as well the coding in `anova.rq` as revealed by `getAnywhere(anova.rq)`. It would explain why the ordering of models changes nothing, and `lm` is not called by `anova.rq` – JustGettinStarted Sep 25 '15 at 17:40