-1

I'm using RStudio Version 1.0.153.

I have a folder of approximately 30 PDFs. I would like to convert them to respective objects in R as character strings. I already have the pdftools package and it successfully converts to objects, I'm just looking for a way for it to iteratively go through a list of PDFs in a folder to automatically assign to its respective variable.

For example if I have 30 pdfs named as "P1.pdf, P2.pdf, P3.pdf.... P30.pdf," how do I get R to convert them all to text using pdftools so that they are their own respective R objects called P1, P2, P3.... P3?

Thanks a ton.

I've been learning so much on here!

Meera

MeeraWhy
  • 93
  • 6
  • Have you tried anything yourself? Please look at [this](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and consider editing your question to reflect additional info. Generally, people here expect questions to show some effort, as the site is not intended to just write code for people who need help. – shea Oct 10 '17 at 02:17
  • I'm a beginner and I've tried a few things and I'm obviously new to this. I did not post this to have someone write code for me and your comment/s are presumptuous in this regard. I'm still having trouble figuring out how to write iterative code and was asking for help. – MeeraWhy Oct 10 '17 at 09:30
  • I am not presuming anything, that's why I asked you what you did. A lot of new users get the "Welcome to SO, please read . My first comment was nothing different than a lot of first time users get from other more experienced user here. I did not downvote your question, someone else voted it down, probably because of what I pointed out in my comment. My comment was intended to be constructive, sorry you didn't take it that way. If you show your attempted work, someone can point out where you need help and make suggestions for improvement. – shea Oct 10 '17 at 12:58

1 Answers1

-1

This could work:

pdf_operations <- function{
... #using whatever operations you may have 
}

fnames <- dir(".//PDF Files/", pattern = "\\.pdf")
sapply(fnames, pdf_operations)
Gautam
  • 2,597
  • 1
  • 28
  • 51
  • Why does the OP have to make a new custom function? Did you look to see what functions `pdftools` has? – shea Oct 10 '17 at 02:25
  • In case they want to save some of the data into a separate file (image or another pdf), extract some info, pass it on to another function for cleanup (say for table data - which pdfs do not have a separate format for) or just if they want to name the objects in a certain way. Trying to give the broadest possible solution that would encompass all of the above. – Gautam Oct 10 '17 at 02:29
  • You don't know what the OP tried, so just writing an answer and hoping it's the right answer is the wrong way to answer a question. I tried your solution and from what I can tell, it doesn't work. That's probably because you don't know what the actual function is and what it would / is supposed to do. – shea Oct 10 '17 at 02:40
  • I am familiar with pdftools and I have used it before. The answer is based on the amount of information provided and I've verified it myself. The OP clearly mentioned that they are able to "successfully" convert pdfs into R objects but want a method to do it for a bunch of files. Please re-read the original post and what is being asked therein. The answer also states: this "could" work. Lastly, please refrain from hostile commentary or making assumptions about people posting here - we're all trying to help. If you have a better solution, please feel free to post it. – Gautam Oct 10 '17 at 02:54
  • I don't mean to be hostile, I was merely being direct. But this site isn't built on answers that "could" work. You don't know what the OP knows or doesn't know and so you're guessing at what the answer is. Your answer should include `pdf_text()` or something else. When I used `pdf_text`, the sapply didn't work with the `dir()` output for PDF in one of my directories. – shea Oct 10 '17 at 02:59
  • I maintain the information in my answer is sufficient to work with. You can find an example of a similar approach for reading csv files here: https://stackoverflow.com/questions/40284146/how-to-read-and-name-different-csv-files-in-r/46612637#46612637 – Gautam Oct 10 '17 at 03:03
  • @Marwaha thanks a lot. I'll try to get to this when I hit the office this morning. I'm still learning how to code because I'm an absolute beginner and I'm trying to fit it in between all my other duties as a physician. – MeeraWhy Oct 10 '17 at 09:35
  • @shea Actually, you are being hostile. I'm a beginner that is still learning how to speak in the language in order to frame my question correctly. Your long chastising comments have flooded the response area, keeping people from helping me and keeping me from actually learning to do better. Please move on to actually answering questions that are written according to your particular standards. We have done nothing wrong here. – MeeraWhy Oct 10 '17 at 10:46
  • pdftotext <- { filelist <- list.files(path = 'C:/Users/2004081/Documents/HCC TXP', all.files = TRUE) pdfs <- lapply(filelist, pdf_text) for(i in 1:length(filelist)) assign(P[i], pdf_text("P[I].pdf")) } – MeeraWhy Oct 10 '17 at 12:03
  • The above is what I wrote before I submitted the question. I'm wondering what is wrong with the code. – MeeraWhy Oct 10 '17 at 12:03
  • @MeeraWhy I see a problem with how you've used `filelist`, `assign` and `pdf_text`. it's a bit long to explain here, you can reach me at marwaha.mech@gmail.com – Gautam Oct 10 '17 at 12:24