2

Im trying to convert all the pdf stored in one file, say 60 pdfs into text documents and store them in different folders. the folder should have unique names. i tried this code.The folders where created, but the pdftotext conversion command doesnt work in the loop:

import os
def listfiles(path):
    for root, dirs, files in os.walk(path):
        for f in files:
                print(f)
        newpath = r'/home/user/files/'
        p=f.replace("pdf","")
        newpath=newpath+p 
        if not os.path.exists(newpath): os.makedirs(newpath)
        os.system("pdftotext f f.txt")

f=listfiles("/home/user/reports")
Aliya
  • 31
  • 2
  • 9
  • do you just want to create a text version of each pdf in the same directory as the original, or create the text version in a folder somewhere else? – Matt Mar 24 '15 at 15:44

2 Answers2

2

One problem here is the os.system("pdftotext f f.txt") call. I assume you want the f's here replaced with the current file in the loop. If that is the case you need to change this to os.system("pdftotext {0} {0}.txt".format(f))

Another issue may be that the working directory is not being set up so the call to system is looking for the file in the wrong place. Try using os.chdir every time you change folders.

to place the text file in a diffrent folder try:

os.system("pdftotext {0} {1}/{0}.txt".format(f, newpath))
Matt
  • 724
  • 4
  • 11
0

I don't know Python, but I think I can clearly see a mistake there. It looks like you are just replacing the ".pdf" with a ".txt". Since a PDF isn't just plain text, this won't work. For the convertion look at the top answer of this post: Python module for converting PDF to text

Community
  • 1
  • 1
KJaeg
  • 698
  • 3
  • 7
  • 23
  • no, @Aliya is using a pdftotext tool called from the command line – Matt Mar 24 '15 at 15:31
  • i was trying to create a folder for abc.pdf named abc. – Aliya Mar 24 '15 at 15:31
  • i want to call the terminal command pdftotext... eg:pdftotext abc.pdf ff.txt, creates a text file ff.txt for the file abc.pdf.. i want this to happen for all the pdf in my file.. i want it to wrok in loop – Aliya Mar 24 '15 at 15:34
  • In the command line where you call pdftotext, are you currently in the directory where you want to put that textfile in? Does it help if you give pdftptext a path as a parameter like "path/to/ff.text"? How do you know, that the conversion doesn't work? Are they just not listed? Maybe you converted them, but they are located somewhere else, now. – KJaeg Mar 24 '15 at 15:41