1

pdf_text() is not releasing RAM. Each time the function runs, it uses more RAM, and doesn't free it up until the R session is terminated. I am on windows.

Minimal example

# This takes ~60 seconds and uses ~500mb of RAM, which is then unavailable for other processes

library(pdftools)
for (i in 1:5) {
  
  print(i)
  pdf_text("https://cran.r-project.org/web/packages/spatstat/spatstat.pdf")
  
}

My question

Why is pdf_text() using so much memory and how can it be freed it up? (without having to terminate the R session)

What I've tried so far

I have tried gc() inside the loop

I have checked that pdf_text() isn't creating some hidden objects (by inspecting ls(all=TRUE)

I have cleared the R session's temp files

Also note

Although the size of the particular pdf in the example above is about 5mb, calling pdf_text on it uses about 20 times that much ram! I am not sure why

Community
  • 1
  • 1
stevec
  • 41,291
  • 27
  • 223
  • 311

2 Answers2

1

This sounds like a memory leak. However I cannot reproduce this problem on MacOS.

I have started an issue to track this, can you please report which version of pdftools and libpoppler you are using that show this behavior?

Jeroen Ooms
  • 31,998
  • 35
  • 134
  • 207
1

For anyone arriving here through google, here's what solves the issue for me - it's built on Jeroen's suggestion here

pdf_urls <- c("https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf", 
              "https://cran.r-project.org/web/packages/dplyr/dplyr.pdf",
              "https://cran.r-project.org/web/packages/pdftools/pdftools.pdf")

pdfs <- list()

for(i in 1:length(pdf_urls)) {

  print(paste("Obtaining pdf", i, "of", length(pdf_urls)))
  pdf_url <- pdf_urls[i]

  pdfs[[i]] <- callr::r(function(pdf_path){
    pdftools::pdf_text(pdf_path)
  }, args = list(pdf_url))

}

stevec
  • 41,291
  • 27
  • 223
  • 311