0

I am using R to do some analysis. Previously, I have write my own functions and the R script run perfectly on mac OS.

However, when I try to run the same R script on Windows 64-bit, I came across some strange problems. For instance, after I installed and loaded the package plyr, I can actually run the function laply directly. But when I run my own function, which consists of function laply, it returns an error stating that "could not find function laply".

Also, since I tried to perform parallel computing, I loaded the package doParallel and use it together with library foreach. However, one of my functions returns error which states that could not find function %do%, while other functions do not. This is very very strange to me and I am so eager to solve it, yet no clues for me.

The error takes place in the function called Func.prune. Basically, it examines the association rules and find the redundant rules based on the lift value. The function is shown as follows. Here I also provide some input data.

rules <- list(Ant=list(c("CDWP = 3","CT in [369.38; 450.629]"),
                   c("CDWP = 3","Month = 3"),
                   c("Month = 3","PCHWP = 3"),
                   c("CDWP = 3","Month = 3"),
                   c("CDWP = 3","Month = 3","PCHWP = 3")),
          Con=list("PCHWP = 3",
                   "WCC in [1040.528; 1882.797]",
                   "WCC in [1040.528; 1882.797]",
                   c("PCHWP = 3","WCC in [1040.528; 1882.797]"),
                   "WCC in [1040.528; 1882.797]"))

rules.m=data.frame(Freq=c(1760,rep(1740,4)),
               Supp=c(0.2821,rep(0.2788,4)),
               Conf=rep(1,5),
               Lift=c(1.814250,1.946198,1.946198,2.028336,1.946198))

accuracy=50

Func.prune <- function(rules, rules.m, accuracy) {
require(foreach)
require(doParallel)
require(plyr)
registerDoParallel(cores=12)

item.ant <- llply(.data=rules$Ant, .fun=function(x) sapply(strsplit(x=x, split=" "), FUN=function(x) x[1]))
item.con <- llply(.data=rules$Con, .fun=function(x) sapply(strsplit(x=x, split=" "), FUN=function(x) x[1]))

res.prune <- foreach(i=1:length(item.ant)) %dopar% {
ant.ori <- rules$Ant[[i]]
con.ori <- rules$Con[[i]]
ant <- item.ant[[i]]
con <- item.con[[i]]
res.1 <- sapply(X=item.ant, FUN=function(x) {
  if((length(x)<length(ant)) && (length(which(x %in% ant))==length(x))) {out=1} else {out=0} 
  return(out)})
res.2 <- sapply(X=item.con, FUN=function(x) {
  if(length(x)==length(con) && length(which(x%in%con))==length(x)) {out=1} else {out=0}
  return(out)
})
ind.sub.cand <- which(res.1==1 & res.2==1)
if(length(ind.sub.cand)==0) {final.upd=0} else {
  #To check whether the consequent of sub candidate is the same with the consequent of considered rules
  #Need to define accuracy to join similar ranges
  ind.filt <- foreach (j = 1:length(ind.sub.cand), .combine=c) %do% {
    ant.cand <- rules$Ant[[ind.sub.cand[j]]]
    con.cand <- rules$Con[[ind.sub.cand[j]]]
    con.cand.ind <- foreach(m = 1:length(con.cand), .combine=c) %do% {
      if(length(grep(pattern="=", x=con.cand[m]))==1) {
        out.ind=ifelse(sapply(X=strsplit(x=con.cand[m], split=" = "), FUN=function(x) x[2])==sapply(X=strsplit(con.ori[grep(pattern=sapply(X=strsplit(x=con.cand[m], split=" = "), FUN=function(x) x[1]), x=con.ori)], split=" = "), FUN=function(x) x[2]), yes=T, no=F)
      } else {
        name <- sapply(strsplit(x=con.cand[m], split=" in "), FUN=function(x) x[1])
        low.ori <- sapply(strsplit(x=sapply(X=strsplit(x=con.ori[grep(pattern=name, x=con.ori)], split=" in "), FUN=function(x) x[2]), split="; "), FUN=function(x) x[1])
        high.ori <- sapply(strsplit(x=sapply(X=strsplit(x=con.ori[grep(pattern=name, x=con.ori)], split=" in "), FUN=function(x) x[2]), split="; "), FUN=function(x) x[2])
        low.ori.upd <- round_any(as.numeric(substr(x=low.ori, start=2, stop=nchar(low.ori))), accuracy=accuracy, f=floor)
        high.ori.upd <- round_any(as.numeric(substr(x=high.ori, start=2, stop=(nchar(high.ori))-1)), accuracy=accuracy, f=ceiling)
        low <- sapply(strsplit(x=sapply(strsplit(x=con.cand[m], split=" in "), FUN=function(x) x[2]), split="; "), FUN=function(x) x[1])
        high <- sapply(strsplit(x=sapply(strsplit(x=con.cand[m], split=" in "), FUN=function(x) x[2]), split="; "), FUN=function(x) x[2])
        low.upd <- round_any(as.numeric(substr(x=low, start=2, stop=nchar(low))), accuracy=accuracy, f=floor)
        high.upd <- round_any(as.numeric(substr(x=high, start=1, stop=(nchar(low)-1))), accuracy=accuracy, f=ceiling)
        out.ind <- ifelse(low.upd==low.ori.upd && high.upd==high.ori.upd, yes=T, no=F)
      }
      return(out.ind)
    }
    con.match <- ifelse(length(which(con.cand.ind==T))==length(con.cand), yes=1, no=0)
  }
  ind.sub.upd <- ind.sub.cand[which(ind.filt==1)]
  if(length(ind.sub.upd)==0) {final.upd=0} else {
    #To check whether the antecedent of sub candidate are subset of the considered rule's antecedent
    out.final <- foreach(q = 1:length(ind.sub.upd), .combine=c) %do% {
      ant.filt <- rules$Ant[[ind.sub.upd[q]]] 
      ant.ind <- foreach(p = 1:length(ant.filt), .combine=c) %do% {
        if (length(grep(pattern=" = ", x=ant.filt[p]))==1) {
          name <- sapply(strsplit(x=ant.filt[[p]], split=" = "), FUN=function(x) x[1])
          ant.ori.value <- ant.ori[grep(pattern=name, x=ant.ori)]
          res.ind <- ifelse(sapply(X=strsplit(x=ant.filt[[p]], split=" = "), FUN=function(x) x[2])==sapply(strsplit(ant.ori.value, split=" = "), FUN=function(x) x[2]), yes=T, no=F)
        } else {
          name <- sapply(strsplit(x=ant.filt[[p]], split=" in "), FUN=function(x) x[1])
          ant.ori.value <- ant.ori[grep(pattern=name, x=ant.ori)]
          low.ori <- sapply(strsplit(x=sapply(X=strsplit(ant.ori.value, split=" in "), FUN=function(x) x[2]), split="; "), FUN=function(x) x[1])
          high.ori <- sapply(strsplit(x=sapply(X=strsplit(ant.ori.value, split=" in "), FUN=function(x) x[2]), split="; "), FUN=function(x) x[2])
          low.ori.upd <- round_any(x=as.numeric(substr(x=low.ori, start=2, stop=nchar(low.ori))), accuracy=accuracy, f=floor)
          high.ori.upd <- round_any(x=as.numeric(substr(x=high.ori, start=1, stop=(nchar(high.ori)-1))), accuracy=accuracy, f=ceiling)
          low <- sapply(strsplit(x=sapply(strsplit(x=ant.filt[p], split=" in "), FUN=function(x) x[2]), split="; "), FUN=function(x) x[1])
          high <- sapply(strsplit(x=sapply(strsplit(x=ant.filt[p], split=" in "), FUN=function(x) x[2]), split="; "), FUN=function(x) x[2])
          low.upd <- round_any(as.numeric(substr(x=low, start=2, stop=nchar(low))), accuracy=accuracy, f=floor)
          high.upd <- round_any(as.numeric(substr(x=high, start=1, stop=(nchar(low)-1))), accuracy=accuracy, f=ceiling)
          res.ind <- ifelse((low.upd>=low.ori.upd) && (high.upd<=high.ori.upd), yes=T, no=F)
        }
        return(res.ind)
      }
      ant.match <- ifelse(length(which(ant.ind==T))==length(ant.filt), yes=1, no=0)
    }
    ind.sub.final <- ind.sub.upd[which(out.final==1)]

    #To check the lift value
    final <- foreach(o = 1:length(ind.sub.final), .combine=c) %do% {
      lift.ori <- rules.m[i, "Lift"]
      lift.sub <- rules.m[ind.sub.final[o], "Lift"]
      v <- ifelse(lift.sub >= lift.ori, yes=T, no=F)
    }
    final.upd <- ifelse(length(which(final==T))==0, yes=0, no=ind.sub.final[which(final==T)])
  }
  return(final.upd)
 }
}
return(res.prune)
}

So when actualy run this function:

Func.prune(rules=rules, rules.m=rules.m, accuracy=accuracy) 

I got the following error: Error in { : task 5 failed - couldnot find function %do%

Any help is appreciated. Thanks in advance for your help.

Alex Brown
  • 41,819
  • 10
  • 94
  • 108
rajafan
  • 141
  • 2
  • 8
  • Have you `require`d the package? – hd1 Jan 15 '14 at 02:41
  • @hd1, Yes, I use library() to load the package and I can actually run the function. But when the function is embedded in my own function, the error returns. In addition, what is the difference between require() and library()? I suppose they are the same. Thanks for your reply. – rajafan Jan 15 '14 at 02:51
  • A [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) (with code showing what runs and what doesn't) will help us diagnose your problem. – Blue Magister Jan 15 '14 at 03:07
  • @BlueMagister, thanks. I have editted the question accordingly. – rajafan Jan 15 '14 at 03:34
  • A *minimal* reproducible example (the most succinct code that describes the problem) would help even more. – Blue Magister Jan 15 '14 at 04:11
  • @BlueMagister, Actually the code I provided above is minimal. It only contains the input data, i.e., rules, rules.m, and accuracy; the function "Func.prune". And you can get the error by running the function. Thanks. – rajafan Jan 15 '14 at 04:19
  • This earlier question may help. http://stackoverflow.com/questions/20704235/function-not-found-in-r-doparallel-foreach-error-in-task-1-failed-cou/20705224#20705224 – Blue Magister Jan 15 '14 at 05:37
  • I removed a period that was causing an error from the call and I get no error (mac 10.7.5/ R 3.0.2). Result is : [[1]] [1] 0 [[2]] [1] 0 [[3]] [1] 0 [[4]] [1] 0 [[5]] [1] 0 – IRTFM Jan 15 '14 at 06:19
  • @BlueMagister, thanks so much! It works. So the key is to define the .package parameter in function foreach(). Thanks again! – rajafan Jan 15 '14 at 06:20
  • @IShouldBuyABoat, thanks for you reply. Actually the script runs well on mac, which I use doMC rather than doParallel. The solution is to define the .package parameter in function foreach. Thanks though. – rajafan Jan 15 '14 at 06:22

1 Answers1

2

Here's a simple example that reproduces the problem:

library(doParallel)
cl <- makePSOCKcluster(6)
registerDoParallel(cl)
foreach(i=1:10) %dopar% {
  foreach(j=1:10) %do% j
}

Because the workers started by makePSOCKcluster haven't loaded the foreach package, you get the error:

Error in { : task 1 failed - "could not find function "%do%""

Adding the .packages='foreach' option to the outer foreach loop fixes the problem:

foreach(i=1:10, .packages='foreach') %dopar% {
  foreach(j=1:10) %do% j
}

Note that if you register doParallel with:

registerDoParallel(6)

then the example fails on Windows, but succeeds on Mac OS X and Linux. This is because doParallel uses mclapply in this case on Mac OS X and Linux, so the workers have foreach loaded since they were forked by an R session that had loaded foreach. That is also why the example works with doMC.


Digression on registerDoParallel

The arguments to registerDoParallel are a bit confusing, since the difference between cl and cores is not clear. I believe the intent is to specify a cluster object with cl or the number cores with cores, but you can also specify the number of cores with cl. If cl is a number on Windows, then a cluster object is implicitly created for you since mclapply doesn't run in parallel on Windows. I think this should also happen on Windows if cores is used, but this doesn't work for me with doParallel 1.0.6, which is the current version on CRAN:

> packageVersion('doParallel')
[1] ‘1.0.6’
> registerDoParallel(cores=6)
> getDoParWorkers()
[1] 3

I consider this to be a bug, and will report it to the package maintainer.

In any case, I wouldn't use registerDoParallel(cl=makeCluster(6)), since that doesn't give you a way to shutdown the cluster object, which is good practice. I would use:

cl <- makeCluster(6)
registerDoParallel(cl)
# do stuff in parallel
stopCluster(cl)

if I wanted to have access to the cluster object in order to export variables to the workers for example, or simply:

registerDoParallel(6)

If a cluster object is implicitly created for you, it will be shutdown by the packages's .onUnload function.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • Thanks a lot for your reply. It is very detailed and helpful. I have one more question. For windows OS, what is the difference between defining the "cl" and "cores" parameters in "registerDoParallel". I tried registerDoParallel(cl=makeCluster(12)) and registerDoParallel(cores=12), followed by getDoParWorkers(), and both returned 12. Does it mean these two parameters are equivalent and we can choose to define either of them? Thanks in advance for your kind help. – rajafan Jan 16 '14 at 01:54
  • @rajafan Hopefully my updated answer will help. – Steve Weston Jan 16 '14 at 15:32