2

I am trying to convert texts in pdf file to text or HTML format, but this error is occurring frequently 'cannot import name 'process_pdf' from 'pdfminer.pdfinterp' ' How can I remove this ?

I have tried this code in the visual basic studio, but it's still not working , but in that case, I got indentation error due to spaces, so I tried this in the jupyter notebook and got this error.

from io import StringIO
from pdfminer.pdfinterp import PDFResourceManager , process_pdf
from pdfminer.converter import TextConverter
from pdfminer.layput import LAParams



def to_txt(pdf_path):
    input_ = file(pdf_path , 'rb')
    output = StringIO()

    manager = PDFResourceManager()
    converter = TextConverter(manager, output, laparams = LAParams())
    process_pdf(manager, converter, input_)

    return output.getvalue()

b = to_txt(rb"C:\Users\Jasvinder Singh\Desktop\HACK-IN REPORT.docx")

ImportError: cannot import name 'process_pdf' from 'pdfminer.pdfinterp' (C:\Users\Jasvinder Singh\Anaconda3\lib\site-packages\pdfminer\pdfinterp.py)
  • Welcome to SO! Why do you think `process_pdf` is a method in `pdfminer.pdfinterp`? – vekerdyb Jul 17 '19 at 16:24
  • @vekerdyb as I am getting started with this new field, I searched for codes here on stack overflow, but it was showing codes more than 12 months or older, so I chose to go with it. Also, I was searching for its documentation but can't get an appropriate site. Thanks for the help. – simarpreetsingh.019 Jul 17 '19 at 17:40

1 Answers1

1

Please see the documentation and this comment on a bug.

The process_pdf method has been replaced by PDFPage.get_pages().

vekerdyb
  • 1,213
  • 12
  • 26