I am trying to convert PDF to text in Python. But it is giving me an error:
PDFTextExtractionNotAllowed: Text extraction is not allowed: <_io.BufferedReader name='C:\Users\Downloads\Facts_for_2017.pdf'>
Code which I am using is:
import sys
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import XMLConverter, HTMLConverter, TextConverter
from pdfminer.layout import LAParams
import io
def pdfparser(data):
fp = open(data, 'rb')
rsrcmgr = PDFResourceManager()
retstr = io.StringIO()
codec = 'utf-8'
laparams = LAParams()
device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)
for page in PDFPage.get_pages(fp):
interpreter.process_page(page)
data = retstr.getvalue()
return data
if __name__ == '__main__':
text = pdfparser(Input_path)
Can anyone help me?
File path is:
https://drive.google.com/file/d/1RyR-J-EwMywL6BqsYbl4Ocm96VzCYrM7/view?usp=sharing