I have multiple pdf files saved in a folder. I need to extract the first date of the format like "November 19 2020" from each file in a data frame.
Here is the code I am using:
myextr2 <- function(pdffile) {
text_data <- pdf_text(pdffile)
text_collapsed_data <- paste0(text_data, collapse = '\n')
g=stringi::stri_extract( text_collapsed_data, regex = ("(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)/s+/d{1,2}/s+/d{4}")
g[1]
}
files <- list.files(pattern = "pdf$")
pricing = sapply(files, myextr2)
pricing
I am getting the following error:
Error: unexpected '}' in "}"
Need help on this.