1

I'm trying to include contest-specific variables in my data analysis using the BradleyTerry2 package in R.3.3.1 (I also tried with R.2.11.1 to compare with an older version of BradleyTerry2). The problem I face is that my predictor variables are not properly taken into account. The example below shows you my problem, using the CEMS data to illustrate my point.

    CEMS.BTmodel_01 <- BTm(outcome = cbind(win1.adj, win2.adj),
        player1 = school1, 
        player2 = school2, 
        formula = ~ .. + WOR[student] * LAT[..], 
        refcat = "Stockholm", 
        data = CEMS)
    summary(CEMS.BTmodel_01)

With this model, we get an AIC = 5837.4, an interaction estimated to LAT[..] * WOR[student] = 0.85771

Now, if I add a new school (Toulouse, LAT = 1) at the top of the list

    Toulouse <- c(1,0,0,0,0,0,0)
    Barcelona <- c(0,1,0,0,0,0,0)
    London <- c(0,0,1,0,0,0,0)
    Milano <- c(0,0,0,1,0,0,0)
    Paris <- c(0,0,0,0,1,0,0)
    St.Gallen <- c(0,0,0,0,0,1,0)
    Stockholm <- c(0,0,0,0,0,0,1)
    LAT <- c(1,1,0,1,1,0,0)
    schools <- data.frame(Toulouse, Barcelona, London, Milano, Paris, St.Gallen, Stockholm, LAT)
    rownames(schools) <- c("Toulouse", "Barcelona", "London", "Milano", "Paris", "St.Gallen", "Stockholm")
    CEMS$schools <- schools

I would expect to get the same result from the analysis, because the new school does not appear in the dataset. But I actually get AIC = 5855.8, an interaction LAT[..] * WOR[student] = 0.13199

Playing around with the data, it looks that the names of my predictor variables (here the names of the schools) are not properly taken into account and matched with my comparison data (here the pairwise comparisons from european students). Instead, it's their order that matters.

What I am doing wrong?

Heather Turner
  • 3,264
  • 23
  • 30
C. Rigal
  • 11
  • 3

1 Answers1

0

The rows of CEMS$schools should match the levels of the school1 and school2 factors (the rownames of CEMS$schools are not actually used in the code; the first row should match the first level etc). So you need to update the levels of school1 and school2:

CEMS$preferences <-
within(CEMS$preferences, {
    school1 <- factor(school1, rownames(CEMS$schools))
    school2 <- factor(school2, rownames(CEMS$schools))
    })

CEMS.BTmodel_02 <- BTm(outcome = cbind(win1.adj, win2.adj),
                   player1 = school1, 
                   player2 = school2, 
                   formula = ~ .. + WOR[student] * LAT[..], 
                   refcat = "Stockholm", 
                   data = CEMS)

Now the models are equivalent as expected:

> CEMS.BTmodel_01
Bradley Terry model fit by glm.fit 

Call:  BTm(outcome = cbind(win1.adj, win2.adj), player1 = school1, player2 = school2, 
    formula = ~.. + WOR[student] * LAT[..], refcat = "Stockholm", 
    data = CEMS)

Coefficients  [contrasts:  ..=contr.treatment ]:
        ..Barcelona                 ..London                 ..Milano  
             0.5044                   1.6037                   0.3538  
            ..Paris              ..St.Gallen          WOR[student]yes  
             0.8741                   0.5268                       NA  
            LAT[..]  WOR[student]yes:LAT[..]  
                 NA                   0.8577  
Degrees of Freedom: 4454 Total (i.e. Null);  4448 Residual
  (91 observations deleted due to missingness)
Null Deviance:      5499 
Residual Deviance: 4912     AIC: 5837

> CEMS.BTmodel_02
Bradley Terry model fit by glm.fit 

Call:  BTm(outcome = cbind(win1.adj, win2.adj), player1 = school1, player2 = school2, 
    formula = ~.. + WOR[student] * LAT[..], refcat = "Stockholm", 
    data = CEMS)

Coefficients  [contrasts:  ..=contr.treatment ]:
         ..Toulouse              ..Barcelona                 ..London  
                 NA                   0.5044                   1.6037  
           ..Milano                  ..Paris              ..St.Gallen  
             0.3538                   0.8741                   0.5268  
    WOR[student]yes                  LAT[..]  WOR[student]yes:LAT[..]  
                 NA                       NA                   0.8577 
Degrees of Freedom: 4454 Total (i.e. Null);  4448 Residual
  (91 observations deleted due to missingness)
Null Deviance:      5499 
Residual Deviance: 4912     AIC: 5837
Heather Turner
  • 3,264
  • 23
  • 30
  • Great, it works well and the results are much better now. I also realize that the same method must also be applied to the other covariate matrix (the "students" matrix, in the CEMS example). – C. Rigal Aug 30 '16 at 07:19