0

My code uses pdfminer to convert pdf to text. I want to get the output of these files in a new folder. Currently it's coming in the existing folder from which it does the conversion to .txt using pdfminer. How do I redirect the output to a different folder. I want the output in a folder called "D:\extracted_text" Code till now:

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from io import StringIO
import glob
import os

def convert(fname, pages=None):
   if not pages:
       pagenums = set()
   else:
       pagenums = set(pages)

   output = StringIO()
   manager = PDFResourceManager()
   converter = TextConverter(manager, output, laparams=LAParams())
   interpreter = PDFPageInterpreter(manager, converter)

   infile = open(fname, 'rb')
   for page in PDFPage.get_pages(infile, pagenums):
       interpreter.process_page(page)
   infile.close()
   converter.close()
   text = output.getvalue()   
   output.close


   outfile = os.path.splitext(os.path.abspath(fname))[0] + '.txt'
   print(outfile)
   with open(outfile, 'w', encoding = 'utf-8') as pdf_file:
       pdf_file.write(text)

   return text    




directory = glob.glob(r'D:\files\*.pdf')  

for myfiles in directory:  
     convert(myfiles)
ajai biltu
  • 55
  • 6
  • Try this https://stackoverflow.com/a/8024254/11230028 – Chetan Vashisth Jun 06 '19 at 11:56
  • Possible duplicate of [Redirect output of a function that converts pdf to txt files to a new folder in python](https://stackoverflow.com/questions/56482437/redirect-output-of-a-function-that-converts-pdf-to-txt-files-to-a-new-folder-in) – Jonathan Leffler Jun 09 '19 at 05:09
  • What does `print(outfile)` produce? Look at that to work out what needs to change, and then work backwards to see where what you do get comes from, and thereby work out what you can change to get the result you want. This is basic debugging 101 stuff. – Jonathan Leffler Jun 09 '19 at 05:10

0 Answers0