0

I'm using WEKA with "weather.arff" dataset and then I applied Naive-Bayes classifier with 10-fold cross validation as you can see in the given snapshot. I understand pretty much all except the things that I marked as red in the picture.

There are 9(Yes)+ 5(No) = 14 all together but here these sums exceed the total. And what is this yes(0.63) and No(0.38) mean? Are they related to the performance of the classifier after 10-fold CV?

outlook
  sunny             3.0     4.0
  overcast          5.0     1.0
  rainy             4.0     3.0
  [total]          12.0     8.0

This total here is 20.0, but we have 14 instances? what these each Sunny, Overcast, and rainy Yes and No counts? Where did they come from?

what is this weighted sum? How to calculate and how does that relates to NB?

Click Here to see the picture

desertnaut
  • 57,590
  • 26
  • 140
  • 166

2 Answers2

0

There's an explanation of the 10 fold cross validation in Cross Validation in Weka

There are 10 randomly selected groups of data divided up into 90% training data and 10% test data. With 14 rows of data to work with, it is likely taking 12 rows for training and 2 rows for testing. After running all 10 tests, there will be 20 results. That makes sense for the data about outlook, but the 18 total for windy brings that theory into question.

I believe the 0.63 and 0.38 at the top of the picture represent the percentage of yes and no answers from the 10 tests.

Community
  • 1
  • 1
GregA100k
  • 1,385
  • 1
  • 11
  • 16
0

I found the answer to my question. This problem is called "Zero Frequency Problem" and what WEKA does is that it adds up 1 to each attribute values. The reason is because to avoid 0 probabilities. Otherwise, when multiplying probabilities, the whole probability will become 0. In fact, having zero probability doesn't infer any new information about the case. In addition, It does not have to neither do with a number of "Cross Validation" iterations nor CV performance estimation.

outlook                Yes            No
  sunny             (2+1)=3.0     (3+1)=4.0
  overcast          (4+1)=5.0     (0+1)=1.0
  rainy             (3+1)=4.0     (2+1)=3.0
  [total]             12.0           8.0

Actual Instances = 9 + 5 = 14

Another important thing is that WEKA does this to all the attributes, in this case to Overcast, Temperature, Humidity and Windy.