I have trained a Bayesian network using pgmpy library. I wish to find the joint probability of a new event (as the product of the probability of each variable given its parents, if it has any).
Currently I am doing
infer = VariableElimination(model)
evidence = dict(x_test.iloc[0])
result = infer.query(variables=[], evidence=evidence, joint=True)
print(result)
Here x_test
is the test dataframe.
The result
is very large output with all combination of train data with their probabilities.
+----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+------------------------------------------+-----------------+---------------------------+-----------------------------------------+------------------------------+------------------------+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| data_devicetype | data_username | data_applicationtype | event_type | servicename | data_applicationname | tenantname | data_origin | geoip_country_name | phi(data_devicetype,data_username,data_applicationtype,event_type,servicename,data_applicationname,tenantname,data_origin,geoip_country_name) |
+==============================================================================================================================================+====================================+==========================================+=================+===========================+=========================================+==============================+========================+===========================+=================================================================================================================================================+
| data_devicetype(Mozilla_5_0_Windows_NT_10_0_Win64_x64_AppleWebKit_537_36_KHTML_like_Gecko_Chrome_94_0_4606_81_Safari_537_36) | data_username(christofer) | data_applicationtype(Custom_Application) | event_type(sso) | servicename(saml_runtime) | data_applicationname(GD) | tenantname(amx-sni-ksll0) | data_origin(1_0_64_66) | geoip_country_name(Japan) | 0.0326 |
+----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+------------------------------------------+-----------------+---------------------------+-----------------------------------------+------------------------------+------------------------+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| data_devicetype(Mozilla_5_0_Windows_NT_10_0_Win64_x64_AppleWebKit_537_36_KHTML_like_Gecko_Chrome_94_0_4606_81_Safari_537_36) | data_username(marty) | data_applicationtype(Custom_Application) | event_type(sso) | servicename(saml_runtime) | data_applicationname(VAULT) | tenantname(login_pqr_com) | data_origin(1_0_64_66) | geoip_country_name(Japan) | 0.0156 |
+----------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+------------------------------------------+-----------------+---------------------------+-----------------------------------------+------------------------------+------------------------+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| data_devicetype(Mozilla_5_0_Windows_NT_10_0_Win64_x64_AppleWebKit_537_36_KHTML_like_Gecko_Chrome_94_0_4606_81_Safari_537_36) | data_username(lincon) | data_applicationtype(Custom_Application) | event_type(sso) | servicename(saml_runtime) | data_applicationname(apps_think4ch_com) | tenantname(login_abc_com) | data_origin(1_0_64_66) | geoip_country_name(Japan) | 0.0113 |
......contd
Please help me as to how can I find out the probability of a new event(i.e., a row in test data). The probability expression is P(data_devicetype, data_username, data_applicationtype, event_type, servicename, data_applicationname, tenantname, data_origin, geoip_country_name)