I just need to know if we can convert a scanned pdf file to an editable pdf file using python. I know couple of libraries out there like pytesseract, pyocr
. Guidance in this regard will be highly appreciated. Thanks
Asked
Active
Viewed 1,635 times
0

Umar Aftab
- 147
- 4
- 15
2 Answers
0
A scanned pdf document (number of images combined into one pdf document) saved in pdf format. But in this pdf file, unable to select a single letter of the text and even search for this letter also.
I also faced the same problem. Hence I have handled this with 3 lines of code. It converts scanned pdf files into a select and searchable text pdf document. Hope it works for you!
import ocrmypdf
def scannedPdfConverter(file_path, save_path):
ocrmypdf.ocr(file_path, save_path, skip_text=True)
print('File converted successfully!')

Ajay Verma
- 1
- 2
-3
import os
import subprocess
for top, dirs, files in os.walk('/my/pdf/folder'):
for filename in files:
if filename.endswith('.pdf'):
abspath = os.path.join(top, filename)
subprocess.call('lowriter --invisible --convert-to doc "{}"'
.format(abspath), shell=True)
Referenced fromn here
You should look before you ask.

DUDANF
- 2,618
- 1
- 12
- 42