7

I want to get the orientation of a scanned document. I saw this post Pytesseract OCR multiple config options and I tried to use --psm 0 to get the orientation.

target = pytesseract.image_to_string(text, lang='eng', boxes=False, \
config='--psm 0 tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyz')

But I get an error:

FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/jy/np7p4twj4bx_k396hyc_bnxw0000gn/T/tess_dzgtpadd_out.txt'
Shaido
  • 27,497
  • 23
  • 70
  • 73
lads
  • 1,125
  • 3
  • 15
  • 29

3 Answers3

10

I found another way to get the orientation using pytesseract:

print(pytesseract.image_to_osd(Image.open(file_name)))

This is the output:

Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 21.27
Script: Latin
Script confidence: 4.14
lads
  • 1,125
  • 3
  • 15
  • 29
  • It can detect script or font? What if the document contains different font? – alyssaeliyah Jun 19 '19 at 07:20
  • This is a good solution, but found that it's not very accurate. In a small experiment I did on 9 rotated (right, left, down) PNG document pages, it detected the rotation correctly on only 6. – arun Aug 19 '21 at 20:56
8

Instead of writing regex to get the output from a string , pass the parameter Output.DICT to get the result as a dict

from pytesseract import Output

im = cv2.imread(str(imPath), cv2.IMREAD_COLOR)
newdata=pytesseract.image_to_osd(im, output_type=Output.DICT)

The sample output looks as follows: Use the dict keys to access the values

{
    'page_num': 0,
    'orientation': 90,
    'rotate': 270,
    'orientation_conf': 1.2,
    'script': 'Latin',
    'script_conf': 1.11
}
Mahesh Kumaran
  • 887
  • 2
  • 12
  • 30
3

@lads has already mentioned the method whic can find orientation. I have just used re to get by how much degree do we need to rotate the image.

imPath='path_to_image'
im = cv2.imread(str(imPath), cv2.IMREAD_COLOR)
newdata=pytesseract.image_to_osd(im)
re.search('(?<=Rotate: )\d+', newdata).group(0)
Mousam Singh
  • 675
  • 2
  • 9
  • 29