0

I am getting the following error when using the predict on a model that predicts probability of choosing a set of binary mutually exclusive outcomes. Using the multinom function of the nnet package.

Error in predict.multinom(model_name, df.predict, "probs") : NAs are not allowed in subscripted assignments In addition: Warning message: 'newdata' had 5 rows but variables found have 100 rows

Here is a reproducible example:

require(nnet)

response1 <- sample(runif(100))
response2 <- 1-response1
responses <- as.matrix(data.frame(response1 = response1, response2 = response2))

train <- data.matrix(data.frame(var1 = runif(100), var2 = runif(100)))

multinom.mod <- multinom(responses ~ train)

test.df <- data.frame(var1 = runif(5), var2 = runif(5))
predict.vec <- predict(multinom.mod, test.df)

As you can see, the problem is that my response consists of 2 variables. It appears than when I predict on a smaller number of rows than in the training set, the function tried to join the response variables from the training set with the test set.

UPDATE:

The following works with a new predict set. However, the response variables are being treated as categorical variables and so the prediction is incorrect:

require(nnet)

train <- data.frame(response1 = sample(runif(100)), response2 = 1-response1, var1 = runif(100), var2 = runif(100))

multinom.mod <- multinom(response1 + response2 ~ ., train, type = "probs")

test.df <- data.frame(var1 = runif(5), var2 = runif(5))

predict.vec <- predict(multinom.mod, test.df)
matsuo_basho
  • 2,833
  • 8
  • 26
  • 47
  • That error message means whatever you passed to `newdata` didn't match the names used in the formula and that usually means you made a typo or you used things like `mydata$Y ~ mydataX1 + mydata$X2` in the formula. Can you show the code used to fit the model and the `predict()` call? – Gavin Simpson Jul 06 '17 at 23:52
  • The code used to fit the model is: fit_retail<-multinom(target_retail~train_). Perhaps this is the problem, that I need to specify train_ in the data argument of the multinom function, as well. – matsuo_basho Jul 07 '17 at 00:14
  • Also, if the result of `names(df.logit.model.ready)==names(data.frame(train_))` (where df.logit.model.ready is the predict df and train_ is the train df) are all TRUEs, then how can the column names not match up? – matsuo_basho Jul 07 '17 at 00:23
  • When asking a question you should provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can run and test the code to see what's going on. – MrFlick Jul 07 '17 at 00:48
  • @MrFlick, added a reproducible example – matsuo_basho Jul 07 '17 at 06:02

1 Answers1

1

If you would like to predict the probability of each category of response, you should use:

predict.vec <- predict(multinom.mod, test.df, type = "probs")

otherwise, the prediction is on class by default, type = class.

Update, a complete usage (training and predicting) should look like this:

require(nnet)

response1 <- sample(runif(100))
response2 <- 1 - response1

train <- data.frame(var1 = runif(100), var2 = runif(100))
# train with matrix
responses <- cbind(response1, response2)
multinom.mod <- multinom(responses ~ var1 + var2, train, type = "probs")
# train with category
train$response <- ifelse(response1 > response2, "response1", "response2")
multinom.mod1 <- multinom(response ~ var1 + var2, train)

test.df <- data.frame(var1 = runif(5), var2 = runif(5))
# no matter which training method you use,
# you can predict class (default) or probability
predict.cvec <- predict(multinom.mod, test.df, type = "class")
predict.pvec <- predict(multinom.mod, test.df, type = "probs")

predict.cvec1 <- predict(multinom.mod1, test.df, type = "class")
predict.pvec1 <- predict(multinom.mod1, test.df, type = "probs")
Consistency
  • 2,884
  • 15
  • 23
  • The problem is that if the train set is a dataframe, the model treats the response as categorical, even while specifying `type="probs"` as argument in the multinom function. This is consistent with the documentation: _a formula expression as for regression models, of the form response ~ predictors. The response should be a factor or a matrix with K columns, which will be interpreted as counts for each of K classes._ – matsuo_basho Jul 09 '17 at 12:13
  • Okay, I think I understand your problem. Your model training method in your Update is not correct. I will make edit on my answer in explaining how to use the command. – Consistency Jul 09 '17 at 14:20
  • This is great, thank you very much! Such a small fix for a problem that consumed so much time. – matsuo_basho Jul 09 '17 at 19:11