0

I want to train a decision tree in MATLAB for binary data. Here is a sample of data I use. traindata <87*239> [array of data with 239 features]

1 0 1 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0 0 1 1 0 ... [till 239]
1 1 1 0 0 0 1 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 0 1 ... [till 239]
....

The thing is that this data corresponds to a form which has only options for yes/no. The outcome of the form is also binary and has the meaning that a patinet has some medical disorder or not! we have used classification tree and the classifier shows us double numbers. for example it branches the first node based on x137 value being bigger than 0.75 or not! Since we don't have 0.75 in our data and it has no yes/no meaning we wanted to use a decision tree which is best for our work. The best decision tree for us is the one that is trained based on boolean variables not double ones. Also it understands that the data is not continuous and for example instead of above representation shows x137 is yes o no (1 or 0). Can someone help me with this? I would also appreciate a solution to map our data to double variables and features if the boolean decision tree is not appliable. I am currently using classregtree in matlab with <87*237> as train and <87*1> as results.

Amir Zadeh
  • 3,481
  • 2
  • 26
  • 47

2 Answers2

2

classregtree has an optional input parameter categorical. Using this option, you can pass in a vector indicating which of your input variables are categorical (in your case, this vector would be 1x239, all ones). The decision tree should then contain yes/no decisions rather than numerical thresholds.

Sam Roberts
  • 23,951
  • 1
  • 40
  • 64
  • +1 here is an [example](http://stackoverflow.com/a/1960734/97160) that shows a classification decision tree with both continuous and discrete features – Amro May 27 '12 at 22:22
0

From the help of classregtree:

t = classregtree(X,y) creates a decision tree t for predicting the response y as a function of the predictors in the columns of X. X is an n-by-m matrix of predictor values. If y is a vector of n response values, classregtree performs regression. If y is a categorical variable, character array, or cell array of strings, classregtree performs classification.

What's the type of y in your case? It seems that classregtree is doing regression in your case but you want classification. So, y should be a categorical variable.

EDIT: To make your y categorical, you can try "nominal(y)".

emrea
  • 1,335
  • 9
  • 18