7

My program is working with fax documents stored as separate bitmaps
I wonder if there is a way to detect automatically page orientation (vertical or horizontal) to show image preview for user in right order (meant rotate if neccesary)

Any advices much appreciated!

EDIT: Clarification:
When Faxmachine receives multi-page document it saves each page as separate TIFF file.
My app has built-in viewer displaying those files. All files are scaled to A4 format and saved in TIFF (so there is no change to detect orientation by height/width parameters)
My viewer displays images in portrait mode by default

What I'd like to do is automagically detect situation when org document was printed in landscape mode (eg wide Excel tables) then I'd like to show rotated preview for end user to speed up preview process

Obviously there are 4 possible fax orientation portrait / landscape x 2 kinds of rotations.

I'm even interested simplified solution detecting when org doc was landscape or portrait (I've noticed most of landscape docs needs to be rotated clockwise)

EDIT2: Idea
I think it might be some idea:
If I could draw horizontal and vertical lines and check if line doesn't cut any (black) point. Then we can compare what are more type of lines (horizontal or vertical) and his decides about page orientation.
What do you think ?

Matt Warren
  • 10,279
  • 7
  • 48
  • 63
Maciej
  • 10,423
  • 17
  • 64
  • 97
  • Please clarify. Do you mean that you have a set of images of a mix of portrait and landscape text pages, and you want to analyse the image to determine how it needs to be rotated in order for the text to display the right way up? Presumably there are actually 4 possible orientations, given that that the originals may have been scanned "upside-down". – e100 Apr 01 '10 at 10:53

4 Answers4

3

You could perform a Fast Fourier Transform (FFT) to convert your spatial image to a frequency/angle representation. Then find the angle with the most prominent frequency. It sounds complicated but it's not that hard, it's pretty efficient, and in effect it tests every possible angle at once, instead of being a hard-coded hack that only works for specific angles. Search for a sample implementation with search terms like Numerical Recipes and FFT.

Liudvikas Bukys
  • 5,790
  • 3
  • 25
  • 36
2

You'd need OCR for that. Rolling your own OCR would be a bit difficult, but there might be library or something out there worth looking into? Also, even with good OCR, it's not a 100% reliable solution.

Catdirt
  • 96
  • 5
  • I've followed that way. I've used Teseract .NET free OCR lib for C#. Rotated document as long as got best % ratio. – Maciej May 04 '11 at 09:53
2

I wonder if there are some properties of text you could use to help you do this.

For instance based on a quick glance, there are far more vertical lines in text (l,j,k,m,n etc) than horizontal ones so maybe you could start with this.

But even detecting these isn't straightforward, you'd need to use some sort of filter like a Sobel or Prewitt. They both have horizontal and vertical versions, see here for more info.

Of course the vertical/horizontal lines of an excel spreadsheet would be the strongest edges so you'd have to ignore these and look only at the text.

Alternative: Can you not just give the user an easy way to rotate the images, like the arrows in Windows Picture viewer or just show 4 thumbnail previews they can click on. You might need to cache the 4 versions (if you are rotating) so it's quick, but only if speed turns out to be an issue?

Matt Warren
  • 10,279
  • 7
  • 48
  • 63
2

Here's a paper entitled "Combined Script and Page Orientation Estimation using the Tesseract OCR engine" [pdf]

I haven't been able to find an implementation of their work, but the approach looks good to me:

The basic idea behind the proposed approach is simple.

A shape classifier is trained on characters (classes) from all the scripts of interest. At run-time, the classifier is run independently on each connected component (CC) in the image and the process is repeated after rotating each CC into three other candidate orientations (90°, 180° and 270° from the input orientation).

The algorithm keeps track of the estimated number of characters in each script for a given orientation, and the accumulated classifier confidence score across all candidate orientations. The estimate of page orientation is chosen as the one with the highest cumulative confidence score, and the estimate of script is chosen as the one with the highest number of characters in that script for the best orientation estimate.

Andrew
  • 12,991
  • 15
  • 55
  • 85