1

I would like to create a VariableElimination with multiple possible values with the pgmpy framework. The case is that I have a variable with 4 possible states, and I want to know that conditional probability of if 3 of those values are eligible. However, it is only possible to pass a single value as exact evidence for a Variable Elimination:

print(infer_non_adjust.query(variables=["success"],
                             evidence={'cpu_utilization_pod': 'Mid'}))

Is it possible to check evidence with more than one value, e.g. for 'Mid' and 'High'? I'm unsure whether this is feasible by using virtual evidence since there are only very few examples of documentation on that.

Bob West
  • 21
  • 6

1 Answers1

1

Unfortunately, there is no direct function to do this yet. But there are a couple of ways to still be able to compute the probability in such cases.

  1. Approximate inference using sampling: The idea is to simulate some samples from the model and use that to compute the fraction of samples that satisfy our evidence. Example:
from pgmpy.utils import get_example_model
alarm_model = get_example_model('alarm')
samples = alarm_model.simulate(int(1e5))

# Here we want our evidence to be either of the two dicts.
or_evidence = [{'CO': 'LOW', 'PVSAT': 'LOW'}, {'CO': 'LOW', 'PVSAT': 'NORMAL'}]

# Computing the distribution of P(BP | or_evidence[0] or or_evidence[1])
evidence_prob = samples.loc[((samples.CO == 'LOW') & (samples.PVSAT == 'LOW')) | ((samples.CO == 'LOW') & (samples.PVSAT == 'NORMAL')), 'BP'].value_counts() / samples.shape[0]

# Normalizing the values to get a distribution
print(evidence_prob / evidence_prob.sum())

BP
LOW       0.774212
NORMAL    0.188372
HIGH      0.037415
Name: count, dtype: float64
  1. Exact inference using Variable Elimination: We will use the property that P(A | B) = P(A, B) / P(B) in this case.
from pgmpy.inference import VariableElimination
infer = VariableElimination(alarm_model)

# Compute the joint distribution on all query & evidence variables and just the evidence variables
joint_dist = infer.query(['BP', 'CO', 'PVSAT'])
joint_evid_dist = infer.query(['CO', 'PVSAT'])

# Compute the distribution of for each evidence using the formula above
prob_evid_1 = joint_dist.reduce([('CO', 'LOW'), ('PVSAT', 'LOW')], inplace=False).values / joint_evid_dist.get_value(CO='LOW', PVSAT='LOW')
prob_evid_2 = joint_dist.reduce([('CO', 'LOW'), ('PVSAT', 'NORMAL')], inplace=False).values / joint_evid_dist.get_value(CO='LOW', PVSAT='NORMAL')

# Combine the results and get the final distribution
total_prob = prob_evid_1 + prob_evid_2
normalized = total_prob / total_prob.sum()
print(normalized)

[0.7570089  0.20347757 0.03951353]

As you can see the values from both these approaches are close. You can also increase the sample size in the simulate method in the first approach to get more accurate results. A drawback of the second approach is that if you have a lot of evidence or query variables, computing the joint distributions might lead to memory error.

Ankur Ankan
  • 2,953
  • 2
  • 23
  • 38