0

I just need to know if we can convert a scanned pdf file to an editable pdf file using python. I know couple of libraries out there like pytesseract, pyocr. Guidance in this regard will be highly appreciated. Thanks

Umar Aftab
  • 147
  • 4
  • 15

2 Answers2

0

A scanned pdf document (number of images combined into one pdf document) saved in pdf format. But in this pdf file, unable to select a single letter of the text and even search for this letter also.

I also faced the same problem. Hence I have handled this with 3 lines of code. It converts scanned pdf files into a select and searchable text pdf document. Hope it works for you!

import ocrmypdf
def scannedPdfConverter(file_path, save_path):
    ocrmypdf.ocr(file_path, save_path, skip_text=True)
    print('File converted successfully!')
-3
import os
import subprocess

for top, dirs, files in os.walk('/my/pdf/folder'):
    for filename in files:
        if filename.endswith('.pdf'):
            abspath = os.path.join(top, filename)
            subprocess.call('lowriter --invisible --convert-to doc "{}"'
                            .format(abspath), shell=True)

Referenced fromn here

You should look before you ask.

DUDANF
  • 2,618
  • 1
  • 12
  • 42