I am working on a dataset of 5 columns (named 'Healthy', 'Growth', 'Refined', 'Reasoned', 'Accepted') and 50k rows. I divided it into a train dataset (10k) and a validation set (the rest of the dataset). I built a Bayesian Belief Network with the following edges ('Healthy', 'Refined'), ('Healthy', 'Reasoned'), ('Refined', 'Accepted'), ('Reasoned', 'Accepted'), ('Growth', 'Accepted'). I would like, in order to evaluate the quality of my network, to insert evidence in the nodes 'Healthy', 'Growth', 'Refined' and 'Reasoned', predict the value of 'Accepted' and finally compare it with the actual value in the validation set. The for loop I made stops always after 584 iterations without sending any error message and the kernel looks still busy.
Here is a simpler version of my code. I write only the version of the Network with the Maximum likelihood method for computing the parameters. The issue is the same also with other method of computing the parameters.
import pandas as pd
from pgmpy.base import DAG
from pgmpy.models import BayesianNetwork
from pgmpy.sampling import BayesianModelSampling
from pgmpy.factors.discrete import State
#import dataset
df = pd.read_csv("C:\\Users\\puddu\\Desktop\\Tools\\Dummy.BBN\\Dummy_data_set.csv")
#preliminary operation on dataset
df.rename(columns = {'Q1.Healthy':'Healthy', 'Q2.Growth':'Growth',
'Q3.Refined':'Refined', 'Q9.Accepted':'Accepted',
'Q8.Reasoned':'Reasoned'}, inplace = True)
nodes = ('Healthy', 'Growth', 'Refined', 'Reasoned', 'Accepted')
replies = ['E','D', 'C', 'B', 'A']
edges = [('Healthy', 'Refined'),
('Healthy', 'Reasoned'),
('Refined', 'Accepted'),
('Reasoned', 'Accepted'),
('Growth', 'Accepted')]
for nod in nodes:
df[nod]=df[nod].astype('category')
df[nod] = df[nod].cat.set_categories(replies, ordered=True)
#training set definition
df_train = df.head(10000).copy().reset_index(drop= True)
#directed acyclic graph building
dag = DAG()
dag.add_edges_from(ebunch= edges)
#BBN building + estimating MLE parameters
model_mle = BayesianNetwork(dag)
model_mle.fit(df_train)
df_validation = df.iloc[(10000):(11000),].copy().reset_index(drop= True)
inference_mle = BayesianModelSampling(model_mle)
mle_guesses = 0
for i in range(1000):
evidence = [State(var= 'Growth', state= df_validation['Growth'][i]),
State(var= 'Healthy', state= df_validation['Healthy'][i]),
State(var= 'Reasoned', state= df_validation['Reasoned'][i]),
State(var= 'Refined', state = df_validation['Refined'][i])]
mle_prediction = inference_mle.rejection_sample(size= 1,
evidence = evidence, show_progress= False)['Accepted'][0]
result = df_validation['Accepted'][i]
if mle_prediction == result:
mle_guesses+= 1
print(f"Step {i}")
Thanks to everyone will spend time in helping me.