This is my first post on stack overflow so please be indulgent!
I am trying to develop a code for a multinomial (or conditional) logit model (as described in http://data.princeton.edu/wws509/notes/c6s3.html) This type of model can be seen as an extension of the classical binary logistic regression. Multinomial logit model are frequently used for discrete choices modelling.
In the below code, I describe a model explaining a choice between 2 options (J) when all the participants (N) face the same choice questions (T). I've already developped a similar piece of code for R and it works very well, but slowly (because of "optim" routine). It appears to be more difficult to develop this type of code with PYTHON because of overflow issues. The 1st operation (i.e., numpsy.dot(X,beta)) can lead to very large/small values which become then "incompatible" with numpsy.exp(). As indicated in this blog post, overflow issues tend to occur when: abs(np.dot(X,beta)) > 500. The initial overflow issues have important consequences because they will lead to 0, NaN and Inf values in the rest of the code, generating thus other issues (e.g., np.log(0) or number/Inf). I spent many hours trying to fix this issue with decimal module, bigfloat module, etc., but did not seem to work (Deal with overflow in exp using numpy).
The 2nd best option seems to modify the likelihood function by including some rounding operations (e.g. num[num > 400] = 400) and then using the "Nelder-Mead" algorithm to minimise the log-likelihood. Ideally I would use the "BFGS" algorithm but it is incompatible with the rounding operations.
Any ideas about how I fix the overflow issues in my code?
Remark: Very large np.dot(X,beta) values don't seem to be an issue in R, MATLAB and STATA.
Your help is much appreciated!
PYTHON CODE
N = len(set(data[:,0]))
T = 20
J = 2
Y = np.reshape(data[:,12],(N*T,J))
X = np.matrix(data[:,[33,10,5,6,7,8,9]])
def fn_mnl(beta):
num = np.dot(X,beta)
num[num > 400] = 400 # to avoid overflow with np.exp()
num[num < -400] = -400 # to avoid underflow with np.exp()
num = np.exp(num)
num = num.reshape(N*T,J)
den = np.sum(num, axis=1)
prbj = num/den
prbi = prbj[Y==1]
prbi[prbi <= 0] = 0.0000001 # to avoid issue with np.log()
llik = np.sum(np.log(prbi))
return(-llik)
sv = [0]*X.shape[1]
res = sc.minimize(fn_mnl, sv, method='Nelder-Mead')
R CODE
LOGIT = function(Beta, DATA, X, Y){
num = exp(as.matrix(DATA[,X]) %*% as.vector(Beta))
mat1 = matrix(num, ncol=2, byrow=T)
prbj = mat1/rowSums(mat1)
prbi = prbj[Y==1]
logl = sum(log(prbi))
return(-logl)}