0

I am trying to extract rules from a C50 model while parallel processing. This answer helped me to extract the rules from the model object. However as I need the models to be processed in parallel, I am using foreach. This seems to have a problem with the not exported function, as it does not see the data object. Here is some reproducible code:

library(foreach)
library(doMC)
registerDoMC(2)

j = c(1,2)
result = foreach(i = j) %dopar% {
  library(C50)
  d = iris
  model <- C5.0(Species ~ ., data = d)
  modParty <- C50:::as.party.C5.0(model)
  return(modParty)
}

In this case it just calculates the model twice. In my real code d is a always changing sample which is also generated in the foreach function.

My debugging showed that the miscellaneous line is modParty <- C50:::as.party.C5.0(model). It throws the error

Error in { : task 1 failed - "Object 'd' not found"

even if d is for sure available for each worker in the cluster. I checked that with a log into a file via loginfo(ls()) of the logging package.

Why does the function not see the object d? Any help greatly appreciated.

As additional info here is the traceback()

> traceback()
3: stop(simpleError(msg, call = expr))
2: e$fun(obj, substitute(ex), parent.frame(), e$data)
1: foreach(i = j) %dopar% {
       library(C50)
       d = iris
       model <- C5.0(Species ~ ., data = d)
       modParty <- C50:::as.party.C5.0(model)
       return(modParty)
   }

Edit

Just for clarification: it doesn't have to do anything with foreach. It is the same error with a normal function:

library(C50)

d = iris

getC50Party = function(dat){
  model <- C5.0(Species ~ ., data = dat)
  modParty <- C50:::as.party.C5.0(model)
  return(modParty)
}

c50Party = getC50Party(d)

Error in { : task 1 failed - "Object 'dat' not found"

The problem is that as.party.C5.0 tries to access the data object from the overall workspace.

Community
  • 1
  • 1
Jonny
  • 169
  • 2
  • 12
  • I'm not sure what exactly is going on here because `partykit` (which I co-authored) is not involved at all. It all comes down to what the `C50` package does with its models. My preliminary assessment is that `C5.0` objects should better preserve their `terms` rather than re-building them inside the `as.party` method. But I really haven't digged deep enough into this. As the `C50` maintainer does not seem to follow SO at the moment, I would recommend contacting him directly about this. But maybe the `as.party` method is not exported for a reason... – Achim Zeileis May 29 '16 at 23:04
  • yes, it seams so. Because the `as.party.C5.0` method tries to access `d` not from the `model` object, but from the general workspace (even not from the worker's workspace). I already emailed Max and sent him the link to the SO question. Unfortunately I got no reply... I think for now I can not use this package to run in parallel. I even looked at the source of the function, but for that I guess I am too much of a newbie :/ – Jonny May 30 '16 at 10:45
  • @AchimZeileis I think I found the bug now. The function `as.party.C5.0` calls another unexported function `model.frame.C5.0`. In this method the base function `eval()` is called which uses the wrong environment (in which the `d` object in this case is not found anymore). The solution is to insert `env = parent.frame(2)` as the environment needs to be the second parent frame. Is there any possibility for me to edit the source so it uses the bug fix in case the maintainer does not see this? I already tried [this](http://stackoverflow.com/questions/3384598) – Jonny May 30 '16 at 15:09
  • yes this is the "obvious" problem (if you have worked with formulas/terms/model frames before) but the `parent.frame(2)` solution is probably not the best or most robust solution. I think it would be preferable to preserve the `terms` (as I wrote above) which do preserve the environment information. At least this is what I would try to implement if I were the maintainer. If Max does not respond to your queries, the easiest solution is to download the .tar.gz of the package from CRAN, modify the sources, and re-build and install the package on your machine. – Achim Zeileis May 30 '16 at 15:15
  • oh I see, now your first comment also makes even more sense to me ;-) sorry this is my first time looking deeper into a package. For now I've done as you suggested. I downloaded the source changed it (but only to the `parent.frame(2)` solution) and installed the changed package again. It seems to work with the `as.party()` function and the `fitted` function. But still a bug fix by the maintainer would be great! – Jonny May 30 '16 at 19:09

1 Answers1

2

This is a bug. We do follow Achim's advice and use the terms object except when we get the case wrong.

Try installing from github via

devtools::install_github("topepo/C5.0/pkg/C50")

Your examples works on this version.

topepo
  • 13,534
  • 3
  • 39
  • 52