0

I am new to R and I am trying to grow a Decision Tree:

Here is some of my data set:

Malo   Edad   Sexo      nivel_estudios    Estado Civil
1       35    Femenino  Secundaria         Union Libre
0       48    Femenino  Bachillerato       Casado
0       45    Masculino Bachillerato       Casado
1       27    Femenino  Bachillerato       Union Libre

When I try to execute this piece of code:

tree_model= tree(Malo~., trainingSet)

Where Malo is my binary (0/1) column (of integers) that classifies the object as good or bad and Training set is a random partition of my test set.

I keep on getting this warning:

Warning message:
In tree(Malo ~ ., trainingSet) : NAs introduced by coercion

I don't understand why I am getting this. Help would be greatly appreciated.

user2521067
  • 145
  • 1
  • 4
  • 10
  • can you tell me ?class(Malo). – Aashu Jul 17 '14 at 15:47
  • You have not included any sample data to make this problem [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Please update your question or there's little we can do to help but offer random guesses. – MrFlick Jul 17 '14 at 15:55
  • Added the data set. @Aashu: class(Malo) is Integer. – user2521067 Jul 17 '14 at 16:42

3 Answers3

2

formula stats that

The left-hand-side (response) should be either a numerical vector when a regression tree will be fitted or a factor, when a classification tree is produced. The right-hand-side should be a series of numeric or factor variables separated by +; there should be no interaction terms. Both . and - are allowed: regression trees can have offset terms.

simple example on NA's introduced

as.numeric(c('1','b','2')) 
[1]  1 NA  2 
Warning message: 
NAs introduced by coercion 

hope you understand the problem ,you have added as numeric vector with rest of thestring(Sexo,nivel_estudios,Estado Civil) vector in your formula tree(Malo~., trainingSet).

Community
  • 1
  • 1
Aashu
  • 1,247
  • 1
  • 26
  • 41
1

You may like to apply as.factor() function to the last three columns of the dataset. For example:

trainingSet$Sexo = as.factor(trainingSet$Sexo)

trainingSet$nivel_estudios = as.factor(trainingSet$nivel_estudios)

And you may need to change the column name of the "Estado Civil" to something like "EstadoCivil" or "Estado.Civil" so that we could apply the as.factor() function on it as well.

This way, we could meet the criteria "The right-hand-side should be a series of numeric or factor variables separated by +"

Ting
  • 31
  • 2
1

Response variable (Malo) needs to be "Factor" so use:

trainingSet$Malo <- as.factor(trainingSet$Malo)