When I use a dataframe with only text as columns names the code works fine. Now I need to use a dataframe with tuples as columns names as parameter for MaximumLikelihoodEstimator.estimate_cpd() method from pgmpy.estimators, but it raises the KeyError: "None of [Index(['Consumption', 0], dtype='object')] are in the [columns]"
I was trying to use the MaximumLikelihoodEstimator.estimate_cpd() on a standard Bayesian network to generate the cpds for an Dynamic Bayesian Netwok (DBN) since the said method is not implemented yet for DBNs. So I want to generate a cpd table with the names used on DBNs, that are tuples.
The altered Dataframe categorical_energy_production is similar to this example table:
('Consumption', 0) | ('Production', 0) | ('Nuclear', 0) | ('Oil and Gas', 0) | ('Hydroelectric', 0) | |
---|---|---|---|---|---|
DateTime | |||||
2019-01-01 00:00:00 | high | low | low | high | low |
2019-01-01 01:00:00 | high | high | low | high | high |
Code:
energy_production_model = BayesianNetwork([(('Nuclear', 0),('Production', 0)),
(('Oil and Gas', 0),('Production', 0)),
(('Hydroelectric', 0),('Production', 0))]) #This works
cpd_production = MaximumLikelihoodEstimator(energy_production_model, categorical_energy_production).estimate_cpd(('Production', 0))
# ^ This raises the error KeyError: "None of [Index(['Consumption', 0], dtype='object')] are in the [columns]"