My problem has to do with persistent computational singularity when trying to use the mlogit package.
First, a little bit about my data:
My data concerns predicting choice in the context of a sports draft. Each team makes an ordered selection from the same pool of players, with team and player attributes. Thus, in the language of mlogit, each "team" is an individual, and each "player" an alternative. To provide an oversimplified example, say five teams each chose a player.
Pick Player PPG Age Team
1 Ben Simmons 19.2 19 PHI
2 Brandon Ingram 17.3 18 PHI
3 Jaylen Brown 14.6 19 PHI
5 Kris Dunn 16.4 21 PHI
6 Buddy Hield 25.0 22 PHI
I'm attempting to use the mlogit package. I first use mlogit.data to reformat my data.
Choices <- mlogit.data(test,
choice="picked",
shape="long",
id.var="Team",
alt.var="Player",
chid.var="Team",
varying=c(4:5))
The result looking like:
picked Pick Player PPG Age Team
PHI.Ben Simmons TRUE 1 Ben Simmons 19.2 19 PHI
PHI.Brandon Ingram FALSE 2 Brandon Ingram 17.3 18 PHI
PHI.Jaylen Brown FALSE 3 Jaylen Brown 14.6 19 PHI
PHI.Kris Dunn FALSE 5 Kris Dunn 16.4 21 PHI
PHI.Buddy Hield FALSE 6 Buddy Hield 25.0 22 PHI
LAL.Brandon Ingram TRUE 2 Brandon Ingram 17.3 18 LAL
LAL.Jaylen Brown FALSE 3 Jaylen Brown 14.6 19 LAL
LAL.Kris Dunn FALSE 5 Kris Dunn 16.4 21 LAL
LAL.Buddy Hield FALSE 6 Buddy Hield 25.0 22 LAL
BOS.Jaylen Brown TRUE 3 Jaylen Brown 14.6 19 BOS
BOS.Kris Dunn FALSE 5 Kris Dunn 16.4 21 BOS
BOS.Buddy Hield FALSE 6 Buddy Hield 25.0 22 BOS
MIN.Kris Dunn TRUE 5 Kris Dunn 16.4 21 MIN
MIN.Buddy Hield FALSE 6 Buddy Hield 25.0 22 MIN
NOP.Buddy Hield TRUE 6 Buddy Hield 25.0 22 NOP
Obviously, I have a lot more players and variables but that's the basic structure.
I then try to run a conditional logit regression:
mlogit(Choices,picked ~ <regvar>,data=Choices)
I repeatedly encounter the following error:
Error in solve.default(H, g[!fixed]) :
system is computationally singular: reciprocal condition number = 3.72907e-23
A solution I have seen elsewhere suggests trying to eliminate the non-invertibility by removing highly correlated variables. However, this doesn't seem to solve my issue. The problem persists, with a different exact number, even in simple two-variable models like the example data with low correlations (obviously, with a different number). In fact, it even occurs with a single regressor!
Simplified version of what I'm trying to do:
model<-mlogit(
picked~PPG+Age,
data=Choices)
Perhaps it's just a tolerance issue but given that these variables are not especially correlated, that would be surprising. That would seem to be that something more subtle than variable correlation is at fault. I have also checked, and providing separated individual/alternative specific variables. For example, adding in a team-specific variable such as "Team_MSA_Size" does not change anything:
model<-mlogit(
picked~PPG+Age|Team_MSA_Size,
data=Choices)
Is there something about my data structure or a failure to use mlogit syntax correctly that is leading to this? How would I go about fixing it?
I did find this similar-seeming topic, but I did have trouble following it without the data in question. Is the accepted answer suggesting that each choice must always have the same exact alternatives? If so, that would be deeply unfortunate for me, since obviously each team sees a different list due to players being removed by selection. If that's the problem, is there an easy fix here or does that put it in code-your-own-estimator territory?
I can happily provide more data or other details if it would be helpful.
EDIT: Someone requested toy data. Here is a csv with toy data, and below error-producing code.
setwd("Filepath")
library(mlogit)
toy_data <- read.csv("toy_data.csv",header = TRUE)
Choices_test<- mlogit.data(toy_data,
choice="picked",
shape="long",
id.var="Team",
alt.var="Pick",
chid.var="Team")
mlogit(picked~as.factor(Position)+as.factor(Black)+Age+PPG+APG+RPG+Team_WS,
data=Choices_test)
Error in solve.default(H, g[!fixed]) :
system is computationally singular: reciprocal condition number = 6.53305e-21