1

I would like to determine the angle of inclination of the text in my PDF document (in order to align this document as a result). I receive a PDF document scanned by people, and accordingly, this document will not be perfectly aligned. There are cases when the document is completely upside down, it happens at a slight angle, sometimes at an angle of about 100-120 degrees. In general, as an input, I can get a document rotated absolutely under any degree (from 0 to 360).

The content of the document is printed text and tables (handwritten text may occur, but we will not take this into account).

As I understand it, in order to determine the angle of the entire document (for further alignment), I need to determine the angle of the text. However, here it must be borne in mind, for example, that if the document is rotated 180 degrees, the angle of the text is 0 degrees, but I still have to somehow understand that the document needs to be turned over.

I have been looking for a solution for quite a long time, but I could not find a more or less acceptable result. I really appreciate the community's help!

Below I present examples of documents that I receive (but I reformatted them beforehand, since I could not upload the PDF here)

[1]: https://i.stack.imgur.com/muGJN.png [2]: https://i.stack.imgur.com/yTvzM.png

[3]: https://i.stack.imgur.com/gTy6B.jpg [4]: https://i.stack.imgur.com/shrzq.png

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
Paul
  • 53
  • 3
  • 21

2 Answers2

0

See this related question: Detect if an OCR text image is upside down

It shows how to detect the rotation angle needed to align your image.

diyImma
  • 11
  • 2
  • Yes, I saw this example quite a while ago. But I don't think it helped me, I don't remember why. But I'll try it again – Paul Aug 03 '23 at 11:22
  • Yes, I applied this code for my case and remembered why I abandoned this idea: I still could not determine the angle. If you tell me, based on that code, how to determine the angle of the text, I will be very grateful to you – Paul Aug 03 '23 at 12:30
0

You can do it with a deep text detection library like CRAFT-pytorch and a bit of post-processing:

  1. Using CRAFT, detect texts and find the rotation angles of each line or word of text in the image.
  2. Then do a little bit of post-processing to get the dominant rotation angle of the whole document.

I will explain the steps in detail:

  1. First clone the CRAFT repo and install it: pip install -r requirements
  2. Then you need to post-process the output of the test_net method in test.py (it is in line 162 of the file now):
# test.py file (run it with --refine option to detect the lines)
# ...
# ...
# ...
#       bboxes, polys, score_text = test_net(net, image, args.text_threshold, args.link_threshold, args.low_text, args.cuda, args.poly, refine_net)

        # here you should post-process the polys variable in order to get the dominant angle

        angles = []
        for poly in polys:
            rect = cv2.minAreaRect(poly)
            _, _, angle = rect
            angles.append(angle)
        
        doc_angle = np.median(angles)
        print("doc_angle", doc_angle)

Note that the angle accuracy is almost good thanks to CRAFT, but the final orientation angle is one step away: there's one more ambiguity. What is it? the rotations are actually the orientations of lines and the lines here are not directed. So the final angle computed here, might be 180 degrees off!
So you finally you have two angles, one of which is true for sure: doc_angle and (180 + doc_angle).

If you want to go further and make it find out which one is true, you can do one of these things:

  • You can run an open source OCR like Tesseract on one of the lines with the two candidate rotations. The one with higher confidence is the right one!
  • Find an alternative to CRAFT which gives the orientation of text on a 360 degrees scale.
  • Train a deep neural network to predict the 360 degrees orientation of a line of text (or even the whole document): it should be straightforward by using something like deep-image-orientation-angle-detection. Also you'll need to synthesize text images to use as your training data. You can use something like TextRecognitionDataGenerator
Mostafa Hadian
  • 147
  • 2
  • 6