0

I would like to parallelize tesseract::orc using furr::future_map

However, I get the error Detected a non-exportable reference (‘externalptr’ of class ‘tesseract’) which is related to non-exportable objects, I guess.

I read that others also have problems with furrr and tesseract::ocr, however, I managed to get tesseract::ocr work with the parallel package.

Is there any chance to get tesseract::ocr working with furrr?

# load library
library(tesseract)
library(furrr)
#> Loading required package: future
library(tidyverse)
library(parallel)

# prepare links
links <- rep("http://jeroen.github.io/images/testocr.png",10)


# function to save OCR
f.ocr_save <- function(url){
  
  out <- tryCatch(
    {
      out <-tesseract::ocr(url,engine = tesseract("eng"))},
    
    error=function(cond) {
      message(paste("Problem with the saved URL:"))
      message("Here's the original error message:")
      message(cond)
      return(paste0("error with url"))
    },
    
    warning=function(cond) {
      message(paste("problem with the saved URL"))
      message("Here's the original warning message:")
      message(cond)
      return(NULL)
    } )  
  
  return(out)}


# error with furrr
future_map(links,f.ocr_save)
#Error: Detected a non-exportable reference (‘externalptr’ of class ‘tesseract’) in one of the globals (‘tesseract’ of class ‘function’) used in the future expression


# parallel works

cl <- parallel::makeCluster(2) 
clusterEvalQ(cl, {library(tesseract); library(tidyverse)})
#> [[1]]
#>  [1] "forcats"   "stringr"   "dplyr"     "purrr"     "readr"     "tidyr"    
#>  [7] "tibble"    "ggplot2"   "tidyverse" "tesseract" "stats"     "graphics" 
#> [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
#> 
#> [[2]]
#>  [1] "forcats"   "stringr"   "dplyr"     "purrr"     "readr"     "tidyr"    
#>  [7] "tibble"    "ggplot2"   "tidyverse" "tesseract" "stats"     "graphics" 
#> [13] "grDevices" "utils"     "datasets"  "methods"   "base"
clusterExport(cl, c("f.ocr_save"))

parallel::parLapply(cl,
                    links,
                    f.ocr_save)
#> [[1]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#> 
#> [[2]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#> 
#> [[3]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#> 
#> [[4]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#> 
#> [[5]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#> 
#> [[6]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#> 
#> [[7]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#> 
#> [[8]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#> 
#> [[9]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#> 
#> [[10]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"

parallel::stopCluster(cl)

Created on 2022-12-01 with reprex v2.0.2

ava
  • 840
  • 5
  • 19
  • 1
    Can you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – bretauv Dec 01 '22 at 15:23
  • I added a reprex – ava Dec 01 '22 at 19:08
  • The code works fine for me. Did you make sure you have the latest versions of the packages you use? If so, it may be related to your configuration so can you show you session info? – bretauv Dec 02 '22 at 09:29

0 Answers0