0

so im in the middle of writing a decision tree program. lets say i have a dataset of 1000 instances. as i understand it - with cross validation i split the dataset to 900-100 groups. each time using a different 900 set to create the tree and 100 to test it

what i don't understand is these questions: 1. which tree do i use to as my final decision tree (choosing the one with the least error isn't a good option because i guess it could be because of over-fitting) 2. is cross validation used only to estimate the error in the final tree? 3. i found some different algorithms about cross-validation, some used the same splitting criterion, and some used different ones in order to choose the best tree- can you point me to a good place with information so i could figure out exactly what i need? or explain your self?

Thank you!

iddqd
  • 1,225
  • 2
  • 16
  • 34

1 Answers1

2

Cross validation is used to estimate how accurate your model is predicting.

The best tree should consist best classifiers. i.e. the attributes that seperates the data well, so you can start building your decision-tree, using that attributes.

I suggest you to search over Wikipedia and Uncle Google to get more info about decision trees

ogzd
  • 5,532
  • 2
  • 26
  • 27
  • i know that the best tree should consist the best attributes that separates the data well.. thats the point of the decision tree. there are many way of deciding which attribute is best (i.e gain ration, information gain, gini index , etc') - my question was - how does cross validation help me, if it even does, to choose the way i decide on the splitting criterion – iddqd Feb 08 '13 at 13:22
  • http://stackoverflow.com/questions/2314850/help-understanding-cross-validation-and-decision-trees?rq=1 – ogzd Feb 08 '13 at 13:24