Questions tagged [ocrmypdf]

19 questions
2
votes
2 answers

How do I write a batch process command using gnu parallel?

I'm trying to do some batch processing using a package called ocrmypdf. Here is a command that can process 1 pdf file ocrmypdf input.pdf output.pdf and here is a command that can process all pdf files in the directory we run it in. parallel --tag -j…
SkV
  • 60
  • 1
  • 1
  • 11
2
votes
1 answer

Camelot Cannot extract entire table

Im using Camelot to extract table information from a PDF that i have converted from scanned to searchable using ocrmypdf(500dpi). Camelot seems to be able to identify the table and extract most of the data within the table but it seems to be unable…
1
vote
0 answers

TesseractOCR can not recognize the diameter sign really good ⌀

I have a technical drawing in a PDF-Format and I want to search for very speific values especially the diameter sign in the pdf drawing. I use ocrmypdf which in itself uses Tesseractocr, sometimes it gets right sometimes it doesent but I can't…
Nick Stankat
  • 11
  • 1
  • 3
1
vote
1 answer

ocrmypdf - could not find source-pdf?

i would like to use ocrmypdf to convert some pdf-file from a picture to a readable pdf - Tried it with the following simple code: (the invoice.pdf is of course available in the same path as the python-script and the output.pdf should be…
Rapid1898
  • 895
  • 1
  • 10
  • 32
0
votes
1 answer

ocrmypdf not working when using the docker image and the docker java client over a binding volume

When running an ocrmypdf docker container, all I get is the following message: ocrmypdf: error: unrecognized arguments: 64ee37a6fc66cf591ce4a35f-1.png_OCR.pdf This is what my docker inspect container shows: "Args": [ "ocrmypdf", …
Luis Lavieri
  • 4,064
  • 6
  • 39
  • 69
0
votes
0 answers

UI not able to cancel the ocrmypdf.ocr()

I Created an GUI with python and other few required library my task is to convert an non-searchable pdf - searchable pdf and save it as a new pdf and extract a word form it and save the file with that name . I integrated this code with the GUL…
0
votes
1 answer

ocrmypdf not able to find tesseract path

The issue is that ocrmypdf pdf not able to find the tesseract-engine path even though I have added in the environment variables. So I need a quick solution is it possible to externally add path to the ocrmypdf.ocr() function in python. Even imported…
0
votes
0 answers

Making Searchable to table data in PDF using OCRMYPDF module

1. Create Python OCR function import ocrmypdf def ocr(file_path, save_path): ocrmypdf.ocr(file_path, save_path) 2. Call and use the function. ocr("input.pdf","output.pdf") Using OCRMYPDF module I am not able to make table to searchable. How can…
0
votes
0 answers

How to tell OCRmyPDF work on ONLY 25% of a page

please help, Planning to use OCRmyPDF, however to only extract the drawing block at the right bottom. The entire drawing is pretty big. Can I only scan the 25% at the right bottom? Thank You I read OCRmyPDF document but no success so far.
0
votes
0 answers

Running ocrmypdf with tesseract, ghostscript on windows without admin rights

I have built a python script based on ocrmypdf which requires both tesseract and ghostscript to be installed locally. This script is to be run on a laptop without administrative rights and hence I will not be able to install tesseract and…
John Jam
  • 185
  • 1
  • 1
  • 17
0
votes
0 answers

OcrMyPdf Python: Permission denied: 'unpaper'

I'm trying to use ocrMyPdf library and here is my code: ocrmypdf.ocr("input/mypdf.pdf", "input/mypdf_ocr.pdf", skip_text=False, force_ocr=True, deskew=True, rotate_pages=True, …
Dalireeza
  • 107
  • 3
  • 13
0
votes
0 answers

Heroku: deploy app with which uses ocrmypdf

I need to deploy my nodejs web server which uses ocrmypdf. I choose heroku. Currently i use such heroku buildpacks: 1. heroku/python 2. https://github.com/heroku/heroku-buildpack-apt 3.…
0
votes
0 answers

Snapd Install Ocrmypdf on CentOS 7.6

I install ocrmypdf on Centos 7.6 that use the way "snapd install ocrmypdf". I have successfully installed completely. However, I execute the command "ocrmypdf input.pdf output.pdf" and always say "InputFileError, input.pdf cannot be found". How can…
0
votes
0 answers

Transform text contents of a PDF

I have a PDF with multiple text blocks which are misaligned. I am trying to generate a new PDF with aligned text as per my transformation matrix (known). I can use PyMuPDF (fitz) to extract the text information from the source PDF and insert the…
asymptote
  • 1,133
  • 8
  • 15
0
votes
0 answers

Pycharm debugger doesn't work properly with system commands

I'm trying to debug a program with the following command os.system('ocrmypdf -l por --force-ocr --pages 1 \"' + dirname + '/' + pdf_name + '\" \"' + ocr_dir + str(index) + '.pdf\"') When I run the code it works, but in the debugger it displays the…
1
2