1

Here, I am trying to translate the language of a text by using parallel processing in R. This is the first time I am using Parallel processing. My code is:

    install.packages("RYandexTranslate")
    install.packages("textcat")
    install.packages("plyr")
    install.packages("parallel")
    library("RYandexTranslate")
    library("textcat")
    library("dplyr")
    library("parallel") 
    api_key <- "trnsl.1.1.20160707T103515Z.90fa575d702ae81e.6ec78e064eb94a1c00a9bc506c615f223cf0cf5b"
    cl <- makeCluster(4) 
    Query_L_German <- c("5 euro muenze stempelglanz","2 euro muenzen uebersicht")
    Par_Conversion <- function(QUery_L_German)
    {
      for(i in 1:length(Query_L_German))
      {
        x <- translate(api_key,Query_L_German[i], "de-en")$text
        return(x)
      }
    }  
    a <- length(Query_L_German)   
    parLapply(cl, seq(a), function(i,Query_L_German,Par_Conversion)
      for(i in 1:length(Query_L_German)){
        x <- Par_Conversion(Query_L_German)
        return(x)
      }, Query_L_German, Par_Conversion)

But, I am getting following error:

Error in checkForRemoteErrors(val) : 3 nodes produced errors; first error: object 'Query_L_German' not found

Daniel
  • 129
  • 9
Akshay
  • 41
  • 1
  • 5

1 Answers1

1

When you are using the function parLapply you need to define the function and variabels which are used within parLapply explicitly. This can be done by defining varlist in the the function clusterExport. Here is a in-depth question/answer on how to do this and other stuff with parLapply if you want to understand more.

Your example can be solved by inserting the following line before parLapply is used:

clusterExport(cl, varlist = c("api_key","Query_L_German","translate"))
Community
  • 1
  • 1
Daniel
  • 129
  • 9
  • Thanks Daniel, for pointing this out. I have updated the code. Could you please look into it now – Akshay Jul 19 '16 at 08:54
  • @Akshay Ok. I can reproduce it now. Please look at my edited answer. Hope it is solved now. Greetings, Daniel – Daniel Jul 19 '16 at 09:11
  • Thanks Daniel, Could you please explain me how this parLapply () works. In the above code, I can use Par_Conversion function directly or using this function in parLApply() but I am not seeing any significant improvement in execution time when I use parLapply(). Is it necessary to pass list as input to this function? What is the role of 'seq()' parameter in parLapply() function? – Akshay Jul 20 '16 at 07:16
  • @Akshay No Problem. To be honest the workings of `parLapply()` would be worth another question,a fter you have done some research and still have open questions. There are pretty good resources out there on the web! – Daniel Jul 20 '16 at 08:22
  • @Akshay You can look up what `seq()` does by entering `?seq` and `?parLapply` into your console. It will clear things up, if you have not understood them so far. If you then have still remaining questions, it might be appropriate to post a NEW question. – Daniel Jul 20 '16 at 08:23
  • @Akshay Regarding the performance. Difficult to say why you do not experience a performance speed up. Parallelization has always some sort of overhead. If your problem is not large enough (as the one in the example above), then the overhead of parallelization might be bigger then the speed up. – Daniel Jul 20 '16 at 08:25
  • Thanks Daniel for all the support. I would try this on a larger set of queries and then compare the result. – Akshay Jul 20 '16 at 09:57