I'm trying to create a decision tree with C4.5 algorithm for a school project. The decision tree is for Haberman's Survival Data Set, attribute information is as follows.
Attribute Information:
1. Age of patient at time of operation (numerical)
2. Patient's year of operation (year - 1900, numerical)
3. Number of positive axillary nodes detected (numerical)
4. Survival status (class attribute)
1 = the patient survived 5 years or longer
2 = the patient died within 5 year
And we need to implement a decision tree where each leaf has to have one distinct result (meaning the entropy of that leaf should be 0), however there are six instances where there is the same attributes, but different results.
For example:
66,58,0,2
66,58,0,1
What does C4.5 algorithm do in these type of situations, I've searched everywhere but couldn't find any information.
Thanks.