I would like to parallelize tesseract::orc
using furr::future_map
However, I get the error Detected a non-exportable reference (‘externalptr’ of class ‘tesseract’)
which is related to non-exportable objects, I guess.
I read that others also have problems with furrr
and tesseract::ocr
, however, I managed to get tesseract::ocr
work with the parallel
package.
Is there any chance to get tesseract::ocr
working with furrr
?
# load library
library(tesseract)
library(furrr)
#> Loading required package: future
library(tidyverse)
library(parallel)
# prepare links
links <- rep("http://jeroen.github.io/images/testocr.png",10)
# function to save OCR
f.ocr_save <- function(url){
out <- tryCatch(
{
out <-tesseract::ocr(url,engine = tesseract("eng"))},
error=function(cond) {
message(paste("Problem with the saved URL:"))
message("Here's the original error message:")
message(cond)
return(paste0("error with url"))
},
warning=function(cond) {
message(paste("problem with the saved URL"))
message("Here's the original warning message:")
message(cond)
return(NULL)
} )
return(out)}
# error with furrr
future_map(links,f.ocr_save)
#Error: Detected a non-exportable reference (‘externalptr’ of class ‘tesseract’) in one of the globals (‘tesseract’ of class ‘function’) used in the future expression
# parallel works
cl <- parallel::makeCluster(2)
clusterEvalQ(cl, {library(tesseract); library(tidyverse)})
#> [[1]]
#> [1] "forcats" "stringr" "dplyr" "purrr" "readr" "tidyr"
#> [7] "tibble" "ggplot2" "tidyverse" "tesseract" "stats" "graphics"
#> [13] "grDevices" "utils" "datasets" "methods" "base"
#>
#> [[2]]
#> [1] "forcats" "stringr" "dplyr" "purrr" "readr" "tidyr"
#> [7] "tibble" "ggplot2" "tidyverse" "tesseract" "stats" "graphics"
#> [13] "grDevices" "utils" "datasets" "methods" "base"
clusterExport(cl, c("f.ocr_save"))
parallel::parLapply(cl,
links,
f.ocr_save)
#> [[1]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#>
#> [[2]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#>
#> [[3]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#>
#> [[4]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#>
#> [[5]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#>
#> [[6]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#>
#> [[7]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#>
#> [[8]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#>
#> [[9]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
#>
#> [[10]]
#> [1] "This is a lot of 12 point text to test the\nocr code and see if it works on all types\nof file format.\n\nThe quick brown dog jumped over the\nlazy fox. The quick brown dog jumped\nover the lazy fox. The quick brown dog\njumped over the lazy fox. The quick\nbrown dog jumped over the lazy fox.\n"
parallel::stopCluster(cl)
Created on 2022-12-01 with reprex v2.0.2