2

Running this code I received the "r session aborted: R encountered a fatal error"

I tried uninstalling and reinstalling R 4.3.0 and most recent version of RStudio. This is the code I ran:

library(pdftools)
library(tesseract)
library(tidyr)
library(tidyverse)
library(tidytext)
require(quanteda)
require(topicmodels)

files <- list.files(ignore.case = TRUE, pattern = "pdf$")
files2 <- lapply(files, pdf_text)

I also tried installing the R 4.30 patched version as it was recommended on this forum, but the same happened.

Does anyone have any idea what I could be doing wrong? I have used these lines in the past and never any problem.Previously I ran it on around 300 pdfs, this time it is 1400. I think that could be the problem, but I am not sure. I am also considering using python for this analysis, so if that is better, I could try that too.

Thank you!

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • FYI, you should almost always use `library`, not `require`. The latter never stops following code when the package is not available, which is almost never what is intended. Refs: https://stackoverflow.com/a/51263513 – r2evans Apr 25 '23 at 17:09
  • 2
    I suggest using `files2 <- lapply(files, function(fn) { message(fn); pdf_text(fn); })` so that you know what file triggered the issue. Once that causes the fail, try again (obviously a fresh R session) on just that file and see if it fails again. If so, then start investigating what seems corrupted about that one file (and optionally exclude it when you try to process the remaining pdfs. – r2evans Apr 25 '23 at 17:12
  • 1
    This worked! Thank you so much, there were 2 corrupted files and now everything works – Jane_Coding Apr 26 '23 at 09:19

0 Answers0