I'm trying to include contest-specific variables in my data analysis using the BradleyTerry2 package in R.3.3.1 (I also tried with R.2.11.1 to compare with an older version of BradleyTerry2). The problem I face is that my predictor variables are not properly taken into account. The example below shows you my problem, using the CEMS data to illustrate my point.
CEMS.BTmodel_01 <- BTm(outcome = cbind(win1.adj, win2.adj),
player1 = school1,
player2 = school2,
formula = ~ .. + WOR[student] * LAT[..],
refcat = "Stockholm",
data = CEMS)
summary(CEMS.BTmodel_01)
With this model, we get an AIC = 5837.4, an interaction estimated to LAT[..] * WOR[student] = 0.85771
Now, if I add a new school (Toulouse, LAT = 1) at the top of the list
Toulouse <- c(1,0,0,0,0,0,0)
Barcelona <- c(0,1,0,0,0,0,0)
London <- c(0,0,1,0,0,0,0)
Milano <- c(0,0,0,1,0,0,0)
Paris <- c(0,0,0,0,1,0,0)
St.Gallen <- c(0,0,0,0,0,1,0)
Stockholm <- c(0,0,0,0,0,0,1)
LAT <- c(1,1,0,1,1,0,0)
schools <- data.frame(Toulouse, Barcelona, London, Milano, Paris, St.Gallen, Stockholm, LAT)
rownames(schools) <- c("Toulouse", "Barcelona", "London", "Milano", "Paris", "St.Gallen", "Stockholm")
CEMS$schools <- schools
I would expect to get the same result from the analysis, because the new school does not appear in the dataset. But I actually get AIC = 5855.8, an interaction LAT[..] * WOR[student] = 0.13199
Playing around with the data, it looks that the names of my predictor variables (here the names of the schools) are not properly taken into account and matched with my comparison data (here the pairwise comparisons from european students). Instead, it's their order that matters.
What I am doing wrong?