2

this is the deal. So I have created a BN following the instructions from the BNT manual, is the sprinkler one but I have added a node Class for Winter and Summer. Like this:

 
       Cloudy------
     /      \     |
Sprinkler   Rain  |
       \    /   | |   
        Wet     Class

Where class depends only on wether is cloudy or raining. With the same specification as http://bnt.googlecode.com/svn/trunk/docs/usage.html#basics

And the class is also binary, the table is:

C R Class prob
---------------
1  1  1    0
2  1  1    0.4
1  2  1    0.4
2  2  1    0.9
etc.

So my question using the var_elimination_eng because for some reason jtree is not working, I get this the marg.T after entering evidence for rainy:

ans =

 0.800000000000000
 0.200000000000000

Is this correct? Did I implemented ok? Or did I miss something? Thanks.

Same assumption 1=false=Winter, 2=true=Summer

EDIT:

So yes the CPT of the class node is 8 entries

C R Class prob
---------------
1  1  1    0
2  1  1    0.4
1  2  1    0.4
2  2  1    0.9
1  1  2    1
2  1  2    0.6
1  2  2    0.6
2  2  2    0.6
2  2  2    0.1

These are 1 - Oposite. The code is:

N = 5
dag = zeros(N,N)
C = 1; S = 2; R = 3; W = 4; Class = 5 
dag(C, [S R]) = 1
dag( R, W)  = 1
dag(S, W) = 1
dag(C, Class) = 1
dag(R, Class) = 1
discrete_nodes = 1:N
nodes_size = 2*ones(1,N)
bnet = mk_bnet(dag, nodes_size, names, {'Clody', 'S', 'R', 'W', 'Class'},  'discrete', discrete_nodes)
bnet.CPD{C} = tabular_CPD(bnet, C, [0.5 0.5])
bnet.CPD{R} = tabular_CPD(bnet, R, [0.8 0.2 0.2 0.8])
bnet.CPD{S} = tabular_CPD(bnet, S, [0.5 0.9 0.5 0.1]) 
bnet.CPD{W} = tabular_CPD(bnet, W, [1 0.1 0.1 0.01 0 0.9 0.9 0.99])
bnet.CPD{Class} = tabular_CPD(bnet, Class, [0 0.4 0.4 0.9 1 0.6 0.6 0.1])
evidence = cell(1, N)
evidence{R} = 2
engine = var_elim_inf_engine(bnet)
[engine loglik] = enter_evidence(engine, evidence)
marg = marginal_nodes(engine, R)
marg = marginal_nodes(engine, Class)
marg.T

This is the code I used, as for the jtree, it was giving me some error on matlab which has disappeared, but, I think variable elimin is easier to understand as I'm reading it now.

Thanks.

Pedro.Alonso
  • 1,007
  • 3
  • 20
  • 41

1 Answers1

3

The conditional probability table (CPT) for 'class' should have 8 (2*2*2) elements in this case. The posterior output (marg.T) of the inference engine seems right for a binary variable.

It reads as: "with 0.8 probability the 'class' node is in state 1, and with 0.2 probability it is in state 2". From this point on, it is up to the user to decide whether to appoint 'class' to state 1 or 2.

When it comes to classification, in the simplest (and not very advisable) case, you can define a posterior probability threshold of 0.5 and say:

if P(class=1)> 0.5
class = 1
else
class = 2
end

In assessing the performance of your binary classification, you can look into predictive accuracy or Area Under the ROC curve (AUC) or do more intelligent things that take into account the prior probabilities of the 'class' states.

P.S. You say the junction tree engine does not work in this case, but it should. You may be missing a point, there should be a junction_tree example (I don't exactly remember what the name of the .m file was) in the BNT toolbox .zip file. If you use the junction tree inference engine, you will see that you get the same answer as with variable elimination.

Zhubarb
  • 11,432
  • 18
  • 75
  • 114
  • Yes, your code looks correctly implemented. Now, things start to get slight trickier when you have to learn the CPTs from data (instead of manually appointing them), especially in cases where there is missing data. But it is not your concern for the time being :) – Zhubarb Jul 30 '13 at 16:02
  • Nope, not right now, but what exactly do you mean a little trickier? How so? I would have though that it should be easier since you have the data, and you can just smooth it to account for missing ones. – Pedro.Alonso Jul 30 '13 at 16:28
  • Smoothing the data has many side-effects. And handling missing categorical data can be tricky. Usually you end up having to deal with Expectation maximisation or multiple imputation. Have a look at this: http://stackoverflow.com/questions/9569886/matlab-missing-data-handling-in-categorical-data – Zhubarb Jul 30 '13 at 16:38
  • Aha I see that, I was thinking about Laplace smoothing, but I guess with the type of data is difficult. Thanks :) – Pedro.Alonso Jul 30 '13 at 16:51