How to convert a scanned PDF file to Editable PDF file with python?

Question

I just need to know if we can convert a scanned pdf file to an editable pdf file using python. I know couple of libraries out there like pytesseract, pyocr. Guidance in this regard will be highly appreciated. Thanks

score 0 · Answer 1 · answered Aug 19 '22 at 16:31

A scanned pdf document (number of images combined into one pdf document) saved in pdf format. But in this pdf file, unable to select a single letter of the text and even search for this letter also.

I also faced the same problem. Hence I have handled this with 3 lines of code. It converts scanned pdf files into a select and searchable text pdf document. Hope it works for you!

import ocrmypdf
def scannedPdfConverter(file_path, save_path):
    ocrmypdf.ocr(file_path, save_path, skip_text=True)
    print('File converted successfully!')

score -3 · Answer 2 · answered Nov 03 '19 at 20:18

import os
import subprocess

for top, dirs, files in os.walk('/my/pdf/folder'):
    for filename in files:
        if filename.endswith('.pdf'):
            abspath = os.path.join(top, filename)
            subprocess.call('lowriter --invisible --convert-to doc "{}"'
                            .format(abspath), shell=True)

Referenced fromn here

You should look before you ask.

How to convert a scanned PDF file to Editable PDF file with python?

2 Answers2