2

I try to modify object of class bn.fit (bn.fit.dnet) from R's bnlearn library. I need

  1. to set equal probabilities for every row in bn.fit$node$prob table. For this I use next code:
    library(bnlearn)
    library(purrr)
    
    data(insurance)
    
    bn <- tabu(insurance, score = "bic")
    bn_fit <- bn.fit(bn, insurance, method = 'bayes')
    
    bn_fit[1:length(bn_fit)] <- modify(bn_fit[1:length(bn_fit)], function(node) {
      node$prob <- modify(node$prob, ~(1 / NROW(node$prob)))
      node
    })
    

I suppose this approach bit ugly and almost sure there exist more elegant way to do this. I can not remove 1:length(bn_fit). Also I don't know why I can not use NROW(.x) instead of NROW(node$prob) in my code.

  1. To set arbitrary distribution on EVERY column in bn.fit$node$prob table. I don't understand how to avoid for loops in this case.

Related question is here

user20650
  • 24,654
  • 5
  • 56
  • 91
drastega
  • 1,581
  • 5
  • 30
  • 42
  • 1
    This is moderatley less typing but not in the spirit of using tidy package functions `bn_fit[] = lapply(bn_fit, function(node) { node$prob <- modify(node$prob, ~(1 / NROW(node$prob))) node })` – user20650 Jun 20 '20 at 14:00

1 Answers1

1

Regarding (1), modify takes a list or an atomic vector. bn_fit is of class bn.fit, bn.fit.dnet, however, under the hood it is a list too, since calling typeof() yields list. My guess is that there is no S3 generic method for subsetting these classes so R figures out that it is essentially a list and accordingly strips the class arguments. So subsetting bn_fit turns it into class list, and therefore you can use modify on it. Subsetting can even be done with empty brackets [], it will just return the object, but this time as class list. An alternative that I use below is to "manually" set the class attribute to NULL via attr(bnfit, "class") <- NULL.

Regarding (2), I wrote a tidyverse based function which can be used to alter the prob table of each node into a bayesm::rdirichlet distribution (see code below). The user still needs to provide part of the alpha argument (the length argument is given by the length of each prob table). Under the hood the function is relying on purrr::modify. It takes care of the classes by stripping them first and adding them back once the modification is done. My approach is to turn the prob tables into data.frames then modify the Freq column and adjust it for existing other variables (groups) and then translate that data.frame back into a table using xtabs and formulation notation via reformulate.

I'm not so deep into bayesian networks, so I do not know to what extent this function can be generalized, or whether it only works on the dataset you provided. Further, please test if the modified object is accepted by functions expecting class bn.fit, bn.fit.dnet.

I tried to comment each step of my code, but please ask if something is not clear.

(3) Regarding your question why NROW(.x) does not work in your code and you have to use NROW(node$prob) instead: This has to do with the way modify loops over the prob tables. A good way to check over what elements modify is looping over is to use purrr::pluck.

library(bnlearn)
library(tidyverse)

data(insurance)

bn <- tabu(insurance, score = "bic")
bn_fit <- bn.fit(bn, insurance, method = 'bayes')

change_bn_prob_table <- function(bnfit, alpha) {
  
  # save class attribute of bnfit object
  old_class <- attr(bnfit, "class")
  
  # strip class so that `modify` can be used
  attr(bnfit, "class") <- NULL
  
  # loop over `prop` tables of each node
  new <- purrr::modify(bnfit, function(x) {
    
    # save attributes of x 
    old_x_attr <- attributes(x)
    
    # save attributes of x[["prob"]]
    old_xprob_attr <- attributes(x[["prob"]])
    
    # turn `table` into data.frame
    inp <- as.data.frame(x[["prob"]]) 
    # save names apart from `Freq`
    cnames <- inp %>% select(-Freq) %>% colnames
    
    out <- inp %>% 
      # overwrite column `Freq` with probabilities from bayesm::rdirichlet
      # alpha needs to be supplied (the length of alpha is given by `nrow`)
      mutate(Freq := bayesm::rdirichlet(c(rep(alpha, nrow(inp))))) %>% 
      # devide probilities by sum of Freq in all remaining groups
      group_by(!!! syms(cnames[-1])) %>% 
      mutate(Freq := Freq/sum(Freq)) %>% 
      # turn data.frame back into prob table using formula notation via reformulate
      xtabs(reformulate(paste(colnames(.)), "Freq"), .)
    
    # strip `call` attribute from newly generated prob table
    attr(out, "call") <- NULL  
    # add `class` `table` as attribute
    attr(out, "class") <- "table"
    
    # restore old attribues and write x out to x$prob
    attributes(out) <- old_xprob_attr
    x[["prob"]] <- out
    
    # restore old attribues and return x
    attributes(x) <- old_x_attr
    x
    
  })
  
  # add saved class attributes 
  attr(new, "class") <- old_class
  new
  
}
# here `2` is the first part of `alpha` of `bayesm::rdirichlet`
bn_fit2 <- change_bn_prob_table(bn_fit, 2)

# test that `logLik` can be used on new modified bnfit object 
logLik(bn_fit2, insurance)
#> [1] -717691.8

Created on 2020-06-21 by the reprex package (v0.3.0)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
  • 1
    You can't assign random numbers like that for the probabilities as then the cpt's are not stochastic (although you could if you normalised them after) – user20650 Jun 20 '20 at 12:06
  • 1) Thank you for bn_fit[] but I need object of class bn.fit not list to use with logLik function 2) Yes all categories should sum up to 1. In particular, I want to use bayesm::rdirichlet distribution. I can only use runif (like you) and have problems with any other distribution – drastega Jun 20 '20 at 12:38
  • Thanks for the clarification I will look into it. – TimTeaFan Jun 20 '20 at 13:05
  • Also I don't know why I can not use `NROW(.x)` instead of `NROW(node$prob)` in my code. – drastega Jun 20 '20 at 13:35
  • Thank you for your job @TimTeaFan. This is not exactly what I need because of error in `logLik(bn_fit2, insurance)` but I've fixed your code and got what I want – drastega Jun 21 '20 at 18:50
  • Thanks for pointing out that `logLik` was not working, I forgot to include a couple of arguments to save and restore the attributes of the list elements the function is working on. I added that code and all should be working now. – TimTeaFan Jun 21 '20 at 21:35
  • @TimTeaFan could you explain why do you use `rlang` operators in your code? – drastega Jun 23 '20 at 13:40
  • 1
    @drastega I think the only `rlang` operator I use is `!!! syms` to splice in a vector of colnames as arguments to `group_by`. The number of variables to group by can differ between each node and this is a convenient way to take care of that. Apart from that I have `:=` in my code, but it’s not necessary and can be replaced by `=`. – TimTeaFan Jun 23 '20 at 13:50