30

I'm in the process of attempting to learn to work with neural networks in R. As a learning problem, I've been using the following problem over at Kaggle:

Don't worry, this problem is specifically designed for people to learn with, there's no reward tied to it.

I started with a simple logistic regression, which was great for getting my feet wet. Now I'd like to learn to work with neural networks. My training data looks like this (Column:Row):

- survived: 1
- pclass:   3
- sex:      male
- age:      22.0
- sibsp:    1
- parch:    0
- ticket:   PC 17601
- fare:     7.25
- cabin:    C85
- embarked: S

My starting R code looks like this:

> net <- neuralnet(survived ~ pclass + sex + age + sibsp +
                   parch + ticket + fare + cabin + embarked, 
                   train, hidden=10, threshold=0.01)

When I run this line of code I get the following error:

Error in neurons[[i]] %*% weights[[i]] : 
  requires numeric/complex matrix/vector arguments

I understand that the problem is in the way I'm presenting my input variables but I'm too much of a novice to understand what I need to do to correct this. Can anyone help?

Thanks!

cchamberlain
  • 17,444
  • 7
  • 59
  • 72
user2548029
  • 425
  • 2
  • 6
  • 10
  • 3
    Looking at your data at first, I believe you have convert all data into numerical values. E.g cabin=c85, what does that mean? If u convert this type values to numeric, your problem will be resolved. – user1471980 Jul 03 '13 at 20:06

2 Answers2

54

Before blindly giving the data to the computer, it is a good idea to look at it:

d <- read.csv("train.csv")
str(d)
# 'data.frame': 891 obs. of  12 variables:
#  $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
#  $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
#  $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
#  $ Name       : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
#  $ Sex        : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
#  $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
#  $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
#  $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
#  $ Ticket     : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
#  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
#  $ Cabin      : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
#  $ Embarked   : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...
summary(d)

Some of the variables have too many values to be useful (at least in your first model): you can remove the name, ticket, cabin and passengerId. You may also want to transform some of the numeric variables (say, class), to factors, if it is more meaningful.

Since neuralnet only deals with quantitative variables, you can convert all the qualitative variables (factors) to binary ("dummy") variables, with the model.matrix function -- it is one of the very rare situations in which R does not perform the transformation for you.

m <- model.matrix( 
  ~ Survived + Pclass + Sex + Age + SibSp + Parch + Fare + Embarked, 
  data = d 
)
head(m)
library(neuralnet)
r <- neuralnet( 
  Survived ~ Pclass + Sexmale + Age + SibSp + Parch + Fare + EmbarkedC + EmbarkedQ + EmbarkedS, 
  data=m, hidden=10, threshold=0.01
)
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
  • 1
    Thank you! This is exactly the type of response I was hoping for. Thanks for taking the time to respond in such detail. – user2548029 Jul 04 '13 at 04:05
  • Thank you for this! I have a question @VincentZoonekynd, is there a definite rule to what variables are applicable to the `model.matrix` conversion? –  Jan 09 '14 at 12:10
  • 1
    @llorgge: all qualitative variables, i.e., those of type `factor` (or `character`), will be transformed to dummy variables. But since numeric variables are kept, untransformed, you can actually put all the variables. – Vincent Zoonekynd Jan 09 '14 at 15:50
  • 1
    Thank you! Last question, what is the limit number of values of factors do you recommend that can be accepted into use in the `neuralnet` after `model.matrix`? I am afraid to lose some of my factor variables with values ranging from two to 200. –  Jan 09 '14 at 20:50
  • 1
    @llorgge: You should probably ask on [cross-validated](http://stats.stackexchange.com/). – Vincent Zoonekynd Jan 09 '14 at 23:58
  • Thank you for the detailed response!! Truely helped. – Sarun Dahal Nov 30 '19 at 18:28
  • For me, this error happens in `predict` function! `neuralnet` works fine and fits the trainset. However, neither the training set nor the test set can be predicted by the model due to the same issue of `numeric/complex matrix/vector arguments`. Any idea? @VincentZoonekynd I converted both using `model.matrix` and their column names match. – Hadij Dec 21 '20 at 07:55
7

Error Message "requires numeric/complex matrix/vector arguments" occur when you have factor or character variables in your data.

There are three ways to solve this problem:

  1. Delete the variable
  2. If the variable is an ordered factor, use integer instead.
  3. If the variable is character,transform it into factor and then into dummy variable.

You can use model.matrix() mentioned above or class.ind() function from nnet package to transfer factor into dummy variable.

Tara
  • 345
  • 3
  • 7
  • It is to my surprise since SPSS needs you to put Factor variables separately in the designated box. – Espanta Jul 10 '15 at 12:49