I recently asked the following question about the error requires numeric/complex matrix/vector arguments
when working with the neuralnet
library. Here is my original question: "Working with neuralnet in R for the first time: get "requires numeric/complex matrix/vector arguments" but don't know how to correct".
The solution was to convert the factors in my data frame to "dummy" variables using the model.matrix
function. The resulting code was the following:
matrix.train <- model.matrix(
~ survived + pclass + sex + age + sibsp + parch + fare + embarked,
data = train
)
Because my source data frame is peppered throughout with a number of individual NA
values, the resulting matrix ends up with 714 rows rather than the 891 rows of the original data frame.
This is OK for my training data. However, when I load my test data frame and convert it to a matrix, I run into the same issue. This time I get 331 matrix rows vs the 418 rows in my source data frame.
After I compute
, applying the model to my test data, I'm unable to cbind
my predictions back to my test data because the row counts are different. So, my question is:
Is there a way to force model.matrix
to output the same number of rows as the source data frame, ignoring NA
cases? My model will need to be able to handle NA
and still output a prediction because encountering a row with at least one NA
is common. Alternately, would it be better to tell the neuralnet to treat NA
values as valid factors?
Here is the code I've been attempting to use so far:
#Build a matrix from training data (714 rows vs 891 rows due to NAs in data)
matrix.train <- model.matrix(
~ survived + pclass + sex + age + sibsp + parch + fare + embarked,
data=train
)
library(neuralnet)
#Train the neural net
net <- neuralnet(
survived ~ pclass + sexmale + age + sibsp + parch + fare + embarkedC +
embarkedQ + embarkedS, data=matrix.train, hidden=10, threshold=0.01
)
#Build a matrix from test data (331 rows vs 418 rows due to NAs in data)
matrix.test <- model.matrix(~ pclass + sex + age + sibsp + parch + fare + embarked,
data=test
)
#Apply neural net to test matrix
net.results <- compute(
net, matrix.test
)
#Attempt to map results back to original test data
cleanoutput <- cbind(
net.results$net.result,test
)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 331, 418
When I try to use the rownames
from the train data frame to force the matrix.model matrix into the same row count I get the following:
matrix.train <- matrix.train[match(rownames(train),rownames(matrix.train)),]
> matrix.train
(Intercept) survived pclass sexmale age sibsp parch fare embarkedC embarkedQ embarkedS
1 1 0 3 1 22.00 1 0 7.2500 0 0 1
2 1 1 1 0 38.00 1 0 71.2833 1 0 0
3 1 1 3 0 26.00 0 0 7.9250 0 0 1
4 1 1 1 0 35.00 1 0 53.1000 0 0 1
5 1 0 3 1 35.00 0 0 8.0500 0 0 1
6 NA NA NA NA NA NA NA NA NA NA NA
7 1 0 1 1 54.00 0 0 51.8625 0 0 1
However, that row of NAs is inaccurate. In fact, there may only be one NA value in that row but for some reason whenever one NA value is listed in the row the matrix turns the whole row into NAs. Instead of the above, this is what I would like to see:
> matrix.train
(Intercept) survived pclass sexmale age sibsp parch fare embarkedC embarkedQ embarkedS
1 1 0 3 1 22.00 1 0 7.2500 0 0 1
2 1 1 1 0 38.00 1 0 71.2833 1 0 0
3 1 1 3 0 26.00 0 0 7.9250 0 0 1
4 1 1 1 0 35.00 1 0 53.1000 0 0 1
5 1 0 3 1 35.00 0 0 8.0500 0 0 1
6 1 0 3 1 NA 0 0 6.25 1 0 NA
7 1 0 1 1 54.00 0 0 51.8625 0 0 1