2

I am mining on a dataset using the j48 tree algorithm.

I have been trying to understand what the useLaplace parameter does. The only thing I have to go by is this:

Whether counts at leaves are smoothed based on LapLace

which is just the documentation which WEKA has provided. I have some questions about this though:

  1. What are counts at leaves?
  2. What is smoothing?
  3. What is LapLace? Is it an algorithm used for smoothing?

Everything I have found online doesn't really go into detail about what this parameter is actually doing, rather just explains that it "turns on Laplace smoothing."

Haych
  • 932
  • 13
  • 36

1 Answers1

2

Provost and Domingos found that frequency smoothing of the leaf probability estimates, such as Laplace correction, significantly enhances the performance of the decision tree. From what i have read, counts at leaves (a.k.a leaf probability in my previous sentence) are used to determine probabilistic estimate which can be define by:

P( to be class A | for attribute x) = TruePositive/(TruePositive + FalsePositive)

Smoothing consist in reducing noise and error among the results in the tree in order to produce more accurate probabilistic estimate.

Laplace is a frequency smoothing correction formula:

PLaplace ( to be class A | for attribute x)= (T P + 1)/(T P + F P + C)

where C is the number of clas in the dataset.

lelabo_m
  • 509
  • 8
  • 21
  • So you are saying that if a outlier or an error reaches a leaf, Laplace will try to alleviate the effect it has on the probability of the instances that reach the leaf correctly? I don't understand why this enhances performance. Could you explain that in more detail please? Also, could you please give me the source which you read this from? – Haych Mar 02 '16 at 19:57
  • This helped me to understand: http://researchcommons.waikato.ac.nz/handle/10289/5701 – Haych Mar 02 '16 at 21:52