Questions tagged [ocrmypdf]
19 questions
2
votes
2 answers
How do I write a batch process command using gnu parallel?
I'm trying to do some batch processing using a package called ocrmypdf.
Here is a command that can process 1 pdf file
ocrmypdf input.pdf output.pdf
and here is a command that can process all pdf files in the directory we run it in.
parallel --tag -j…

SkV
- 60
- 1
- 1
- 11
2
votes
1 answer
Camelot Cannot extract entire table
Im using Camelot to extract table information from a PDF that i have converted from scanned to searchable using ocrmypdf(500dpi).
Camelot seems to be able to identify the table and extract most of the data within the table but it seems to be unable…

Douglas Griffin
- 21
- 1
1
vote
0 answers
TesseractOCR can not recognize the diameter sign really good ⌀
I have a technical drawing in a PDF-Format and I want to search for very speific values especially the diameter sign in the pdf drawing. I use ocrmypdf which in itself uses Tesseractocr, sometimes it gets right sometimes it doesent but I can't…

Nick Stankat
- 11
- 1
- 3
1
vote
1 answer
ocrmypdf - could not find source-pdf?
i would like to use ocrmypdf to convert some pdf-file from a picture to a readable pdf -
Tried it with the following simple code:
(the invoice.pdf is of course available in the same path as the python-script and the output.pdf should be…

Rapid1898
- 895
- 1
- 10
- 32
0
votes
1 answer
ocrmypdf not working when using the docker image and the docker java client over a binding volume
When running an ocrmypdf docker container, all I get is the following message:
ocrmypdf: error: unrecognized arguments: 64ee37a6fc66cf591ce4a35f-1.png_OCR.pdf
This is what my docker inspect container shows:
"Args": [
"ocrmypdf",
…

Luis Lavieri
- 4,064
- 6
- 39
- 69
0
votes
0 answers
UI not able to cancel the ocrmypdf.ocr()
I Created an GUI with python and other few required library my task is to convert an non-searchable pdf - searchable pdf and save it as a new pdf and extract a word form it and save the file with that name . I integrated this code with the GUL…
0
votes
1 answer
ocrmypdf not able to find tesseract path
The issue is that ocrmypdf pdf not able to find the tesseract-engine path even though I have added in the environment variables. So I need a quick solution is it possible to externally add path to the ocrmypdf.ocr() function in python.
Even imported…
0
votes
0 answers
Making Searchable to table data in PDF using OCRMYPDF module
1. Create Python OCR function
import ocrmypdf
def ocr(file_path, save_path):
ocrmypdf.ocr(file_path, save_path)
2. Call and use the function.
ocr("input.pdf","output.pdf")
Using OCRMYPDF module I am not able to make table to searchable.
How can…

rohit kumar
- 11
- 3
0
votes
0 answers
How to tell OCRmyPDF work on ONLY 25% of a page
please help,
Planning to use OCRmyPDF, however to only extract the drawing block at the right bottom. The entire drawing is pretty big. Can I only scan the 25% at the right bottom?
Thank You
I read OCRmyPDF document but no success so far.

Also Wa
- 1
0
votes
0 answers
Running ocrmypdf with tesseract, ghostscript on windows without admin rights
I have built a python script based on ocrmypdf which requires both tesseract and ghostscript to be installed locally.
This script is to be run on a laptop without administrative rights and hence I will not be able to install tesseract and…

John Jam
- 185
- 1
- 1
- 17
0
votes
0 answers
OcrMyPdf Python: Permission denied: 'unpaper'
I'm trying to use ocrMyPdf library and here is my code:
ocrmypdf.ocr("input/mypdf.pdf",
"input/mypdf_ocr.pdf",
skip_text=False,
force_ocr=True,
deskew=True,
rotate_pages=True,
…

Dalireeza
- 107
- 3
- 13
0
votes
0 answers
Heroku: deploy app with which uses ocrmypdf
I need to deploy my nodejs web server which uses ocrmypdf. I choose heroku. Currently i use such heroku buildpacks:
1. heroku/python
2. https://github.com/heroku/heroku-buildpack-apt
3.…
0
votes
0 answers
Snapd Install Ocrmypdf on CentOS 7.6
I install ocrmypdf on Centos 7.6 that use the way "snapd install ocrmypdf". I have successfully installed completely. However, I execute the command "ocrmypdf input.pdf output.pdf" and always say "InputFileError, input.pdf cannot be found". How can…

ThunderBird
- 1
- 2
0
votes
0 answers
Transform text contents of a PDF
I have a PDF with multiple text blocks which are misaligned. I am trying to generate a new PDF with aligned text as per my transformation matrix (known). I can use PyMuPDF (fitz) to extract the text information from the source PDF and insert the…

asymptote
- 1,133
- 8
- 15
0
votes
0 answers
Pycharm debugger doesn't work properly with system commands
I'm trying to debug a program with the following command
os.system('ocrmypdf -l por --force-ocr --pages 1 \"' + dirname + '/' + pdf_name + '\" \"' + ocr_dir + str(index) + '.pdf\"')
When I run the code it works, but in the debugger it displays the…