0

I am using multinom function from nnet package for multinomial logistic regression. My dataset has 3 features and 14 different classes with total of 1000 observations. Classes that I have is: 0,1,2,3,4,5,6,7,8,9,10,11,13,15

I divide data set into proper training and calibration, where calibration has only only class of labels(say 4). Training set has all classes except 4. Now, I train the model as

modelfit <- multinom(label ~ x1+x2+x3, data = train)

Now, I use calibration data to find predicted probabilities as:

predProb = predict(modelfit, newdata=calib_set, type="prob")

where calib_set has only three features and no column of Y. Then, the predProb gives me the probabilities of all 16 classes except class 11 for all observations in calibration data.

Also, when I use any test data point, I get predicted probabilties of all classes except class 11. Can, someone explain why is it missing that and how can I get predicted probabilities of all classes? enter image description here

The below picture shows predicted probabiltiies for calibration data, it is missing class 11, (it can miss class 12 and 14 because that are not in the classes) Any suggestions or advices are much appreciated.

PRITI
  • 29
  • 5
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Usually there's always a reference class to avoid overfitting. The probability of the last class is probably just 1 minus the sum of all the others. Hard to say without actually being able to run the code ourselves. – MrFlick Feb 17 '22 at 03:13
  • `df=data.frame(x1=c(5,52,63,74,85,96,97,98,29,110),x2=c(23,24,22,21,25,56,57,68,89,98),x3=c(41,42,43,44,45,46,47,48,49,70),y=c(0,1,5,4,3,2,9,8,3,2))`. Say the daat looks something like that. Although, I understand the point of baseline class. So, is there a way to calculate baseline class predicted probability? – PRITI Feb 17 '22 at 03:35
  • `index = sort(sample(nrow(df), nrow(df)*.90)) t_c <- df[index,] test_ <- df[-index,] ##Instead need automated\ calibrationSet=t_c[t_c$label=="3",] train=t_c[t_c$label!="3",]. calibLabels = calibrationSet[,ncol(calibrationSet),drop=FALSE] calib_set = calibrationSet[,-ncol(calibrationSet),drop=TRUE] testLabels = test_[,ncol(test_),drop=FALSE] test_set = test_[,-ncol(test_),drop=TRUE]``modelfit <- multinom(label ~ ., data = train) `. ` predProb = predict(modelfit, newdata=calib_set, type="prob") ` – PRITI Feb 17 '22 at 03:42
  • From your description, it seems that you the response variables in your train and test sets have different levels. How is this expected to work? – cdalitz Feb 17 '22 at 08:02

0 Answers0