0

Thanks to other articles on this website, I managed to put together a script that will do the following:

  1. Collect PDF file names from directory and put into a list.
  2. Start a data frame using target data from the first PDF in the directory.
  3. Use loop function to add rows to the original data frame containing the same target data (pulling from the same section of the PDF).

My first two steps work (code below)

file_names <- list.files(pattern = "*.pdf")
df <-
  extract_tables(
    file = "firstlastname.pdf",
    method = "decide",
    output = "data.frame"
  ) %>%
  pluck(2) %>%
  t() %>%
  as.data.frame() %>%
  slice(2) %>%
  select(1:3) %>%
  rename("inst" = "V1",
         "date" = "V2",
         "field" = "V3")

but my final step throws the following error: "Error in pluck(., 2) : object 'tmp' not found"

for (i in file_names)
{
  new <-
    extract_tables(
      file = i,
      method = "decide",
      output = "data.frame"
      ) %>%
    pluck(2) %>%
    t() %>%
    as.data.frame() %>%
    slice(2) %>%
    select(1:3) %>%
    rename("inst" = "V1",
           "date" = "V2",
           "field" = "V3") %>%
    df[nrow(df) + 1, ] <- new
}

I am confused because I actually made it all the way through successfully a couple times, but I tried it again after closing RStudio and coming back, and it just won't work anymore. I'm a complete beginner just trying to automate my secretary job a little bit, but I'm probably in way over my head. All I can do is Google things, copy and paste code, and try to understand what everything means and how it comes together.

Unfortunately I can't provide my data files because they contain people's personal information, but the final result is supposed to look like a table with about 50 rows and 3 columns. I did take a photo the first time it worked, though:

successful data frame

Thank you for reading. Any tips would be much appreciated!

  • Without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) it's nearly impossible to say what's going wrong. PDF files can be very messy and the data inside isn't always nicely structured. If errors may occur and you want to skip them, considering adding a tryCatch block: https://stackoverflow.com/questions/14748557/skipping-error-in-for-loop – MrFlick Dec 19 '22 at 21:07
  • Also make sure you don't have a ` %>% before the last line in your loop code. You don't want to pipe the table into the assignment. But i'm not sure why "pluck" would be in the error message if that was the only problem with the code. – MrFlick Dec 19 '22 at 21:09
  • Oh my, I think the extra %>% was exactly the problem. Thank you so much for picking up on that! I will likely post a followup question about a next step that I'm working on and will make sure to provide a reproducible example for that. Thanks again! – Hana Peri Dec 19 '22 at 21:30

0 Answers0