I have nearly one thousand pdf journal articles in a folder. I need to text mine on all article's abstracts from the whole folder. Now I am doing the following:
dest <- "~/A1.pdf"
# set path to pdftotxt.exe and convert pdf to text
exe <- "C:/Program Files (x86)/xpdfbin-win-3.03/bin32/pdftotext.exe"
system(paste("\"", exe, "\" \"", dest, "\"", sep = ""), wait = F)
# get txt-file name and open it
filetxt <- sub(".pdf", ".txt", dest)
shell.exec(filetxt)
By this, I am converting one pdf file to one .txt file and then copying the abstract in another .txt file and compile it manually. This work is troublesome.
How can I read all individual articles from the folder and convert them into .txt file which contain only the abstract from each article. It can be done by limiting the content between ABSTRACT and INTRODUCTION in each article; but I am not able to do so. Any help is appreciated.