11

I'm trying to write a program to programmatically determine the tilt or angle of rotation in an arbitrary image.

Images have the following properties:

  • Consist of dark text on a light background
  • Occasionally contain horizontal or vertical lines which only intersect at 90 degree angles.
  • Skewed between -45 and 45 degrees.
  • See this image as a reference (its been skewed 2.8 degrees).

So far, I've come up with this strategy: Draw a route from left to right, always selecting the nearest white pixel. Presumably, the route from left to right will prefer to follow the path between lines of text along the tilt of the image.

Here's my code:

private bool IsWhite(Color c) { return c.GetBrightness() >= 0.5 || c == Color.Transparent; }

private bool IsBlack(Color c) { return !IsWhite(c); }

private double ToDegrees(decimal slope) { return (180.0 / Math.PI) * Math.Atan(Convert.ToDouble(slope)); }

private void GetSkew(Bitmap image, out double minSkew, out double maxSkew)
{
    decimal minSlope = 0.0M;
    decimal maxSlope = 0.0M;
    for (int start_y = 0; start_y < image.Height; start_y++)
    {
        int end_y = start_y;
        for (int x = 1; x < image.Width; x++)
        {
            int above_y = Math.Max(end_y - 1, 0);
            int below_y = Math.Min(end_y + 1, image.Height - 1);

            Color center = image.GetPixel(x, end_y);
            Color above = image.GetPixel(x, above_y);
            Color below = image.GetPixel(x, below_y);

            if (IsWhite(center)) { /* no change to end_y */ }
            else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
            else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
        }

        decimal slope = (Convert.ToDecimal(start_y) - Convert.ToDecimal(end_y)) / Convert.ToDecimal(image.Width);
        minSlope = Math.Min(minSlope, slope);
        maxSlope = Math.Max(maxSlope, slope);
    }

    minSkew = ToDegrees(minSlope);
    maxSkew = ToDegrees(maxSlope);
}

This works well on some images, not so well on others, and its slow.

Is there a more efficient, more reliable way to determine the tilt of an image?

Joe Phillips
  • 49,743
  • 32
  • 103
  • 159
Juliet
  • 80,494
  • 45
  • 196
  • 228
  • 1
    I like how the code mixes snake_case, camelCase, and PascalCase all in one short block of code. Clearly I've been writing too much F#. – Juliet Sep 17 '09 at 22:37
  • 1
    Why are you using `decimal`? It doesn't add much precision for calculating the slope, and you must cast it back to `double` when passing it to the `atan` method anyway. – Cecil Has a Name Sep 18 '09 at 19:13
  • @Cecil: adding up and dividing a bunch of doubles caused precision problems. Working with decimals up front, then converting to double at the end seemed to work with minimum fuss. – Juliet Sep 18 '09 at 20:45
  • Google and Google Scholar turns up myriad hits for 'document skew angle' (Thanks for the keywords, plinth). Have you had a look there for algorithm ideas? – lguy Nov 23 '09 at 19:47
  • Great thread here. Thanks. Instead of just picking next white pixel, what about picking the next white pixel that is farthest from its regional neighboring black pixels? A region the size of font_height should work pretty well. Convolution with a kernel can do this pretty quickly. – Perry Horwich May 09 '11 at 13:30

9 Answers9

6

I've made some modifications to my code, and it certainly runs a lot faster, but its not very accurate.

I've made the following improvements:

  • Using Vinko's suggestion, I avoid GetPixel in favor of working with bytes directly, now the code runs at the speed I needed.

  • My original code simply used "IsBlack" and "IsWhite", but this isn't granular enough. The original code traces the following paths through the image:

    http://img43.imageshack.us/img43/1545/tilted3degtextoriginalw.gif

    Note that a number of paths pass through the text. By comparing my center, above, and below paths to the actual brightness value and selecting the brightest pixel. Basically I'm treating the bitmap as a heightmap, and the path from left to right follows the contours of the image, resulting a better path:

    http://img10.imageshack.us/img10/5807/tilted3degtextbrightnes.gif

    As suggested by Toaomalkster, a Gaussian blur smooths out the height map, I get even better results:

    http://img197.imageshack.us/img197/742/tilted3degtextblurredwi.gif

    Since this is just prototype code, I blurred the image using GIMP, I did not write my own blur function.

    The selected path is pretty good for a greedy algorithm.

  • As Toaomalkster suggested, choosing the min/max slope is naive. A simple linear regression provides a better approximation of the slope of a path. Additionally, I should cut a path short once I run off the edge of the image, otherwise the path will hug the top of the image and give an incorrect slope.

Code

private double ToDegrees(double slope) { return (180.0 / Math.PI) * Math.Atan(slope); }

private double GetSkew(Bitmap image)
{
    BrightnessWrapper wrapper = new BrightnessWrapper(image);

    LinkedList<double> slopes = new LinkedList<double>();

    for (int y = 0; y < wrapper.Height; y++)
    {
        int endY = y;

        long sumOfX = 0;
        long sumOfY = y;
        long sumOfXY = 0;
        long sumOfXX = 0;
        int itemsInSet = 1;
        for (int x = 1; x < wrapper.Width; x++)
        {
            int aboveY = endY - 1;
            int belowY = endY + 1;

            if (aboveY < 0 || belowY >= wrapper.Height)
            {
                break;
            }

            int center = wrapper.GetBrightness(x, endY);
            int above = wrapper.GetBrightness(x, aboveY);
            int below = wrapper.GetBrightness(x, belowY);

            if (center >= above && center >= below) { /* no change to endY */ }
            else if (above >= center && above >= below) { endY = aboveY; }
            else if (below >= center && below >= above) { endY = belowY; }

            itemsInSet++;
            sumOfX += x;
            sumOfY += endY;
            sumOfXX += (x * x);
            sumOfXY += (x * endY);
        }

        // least squares slope = (NΣ(XY) - (ΣX)(ΣY)) / (NΣ(X^2) - (ΣX)^2), where N = elements in set
        if (itemsInSet > image.Width / 2) // path covers at least half of the image
        {
            decimal sumOfX_d = Convert.ToDecimal(sumOfX);
            decimal sumOfY_d = Convert.ToDecimal(sumOfY);
            decimal sumOfXY_d = Convert.ToDecimal(sumOfXY);
            decimal sumOfXX_d = Convert.ToDecimal(sumOfXX);
            decimal itemsInSet_d = Convert.ToDecimal(itemsInSet);
            decimal slope =
                ((itemsInSet_d * sumOfXY) - (sumOfX_d * sumOfY_d))
                /
                ((itemsInSet_d * sumOfXX_d) - (sumOfX_d * sumOfX_d));

            slopes.AddLast(Convert.ToDouble(slope));
        }
    }

    double mean = slopes.Average();
    double sumOfSquares = slopes.Sum(d => Math.Pow(d - mean, 2));
    double stddev = Math.Sqrt(sumOfSquares / (slopes.Count - 1));

    // select items within 1 standard deviation of the mean
    var testSample = slopes.Where(x => Math.Abs(x - mean) <= stddev);

    return ToDegrees(testSample.Average());
}

class BrightnessWrapper
{
    byte[] rgbValues;
    int stride;
    public int Height { get; private set; }
    public int Width { get; private set; }

    public BrightnessWrapper(Bitmap bmp)
    {
        Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);

        System.Drawing.Imaging.BitmapData bmpData =
            bmp.LockBits(rect,
                System.Drawing.Imaging.ImageLockMode.ReadOnly,
                bmp.PixelFormat);

        IntPtr ptr = bmpData.Scan0;

        int bytes = bmpData.Stride * bmp.Height;
        this.rgbValues = new byte[bytes];

        System.Runtime.InteropServices.Marshal.Copy(ptr,
                       rgbValues, 0, bytes);

        this.Height = bmp.Height;
        this.Width = bmp.Width;
        this.stride = bmpData.Stride;
    }

    public int GetBrightness(int x, int y)
    {
        int position = (y * this.stride) + (x * 3);
        int b = rgbValues[position];
        int g = rgbValues[position + 1];
        int r = rgbValues[position + 2];
        return (r + r + b + g + g + g) / 6;
    }
}

The code is good, but not great. Large amounts of whitespace cause the program to draw relatively flat line, resulting in a slope near 0, causing the code to underestimate the actual tilt of the image.

There is no appreciable difference in the accuracy of the tilt by selecting random sample points vs sampling all points, because the ratio of "flat" paths selected by random sampling is the same as the ratio of "flat" paths in the entire image.

Community
  • 1
  • 1
Juliet
  • 80,494
  • 45
  • 196
  • 228
  • In case you're wondering, I'm mixing decimals and doubles in a strange way for precision reasons. I keep getting NaN when calculating the linear regression using doubles, but it works fine using decimal. – Juliet Sep 18 '09 at 20:56
5

GetPixel is slow. You can get an order of magnitude speed up using the approach listed here.

Community
  • 1
  • 1
Vinko Vrsalovic
  • 330,807
  • 53
  • 334
  • 373
3

If text is left (right) aligned you can determine the slope by measuring the distance between the left (right) edge of the image and the first dark pixel in two random places and calculate the slope from that. Additional measurements would lower the error while taking additional time.

CannibalSmith
  • 4,742
  • 10
  • 44
  • 52
  • If you do go this route I'd pick somewhere around 10 - 20 random sample points and then throw out the statistical anomalies (the samples that fall in between lines of text). Then the remaining samples should draw a fairly straight line and you can use them to calculate the slope. – Steve Wortham Sep 18 '09 at 13:47
  • Based on limited experimentation, I didn't get any better results by choosing random sample points vs sampling all points. White space in the image, such as the space separating paragraphs, is sampled with a slope near zero. Since random sampling will select "flat" paths with a frequency in proportion to the total number of "flat" paths in the entire image, I don't get a better approximation, only a non-deterministic one. However, I did find that averaging all paths within a standard deviation of the mean gave me a better average overall. – Juliet Sep 18 '09 at 20:52
3

First I must say I like the idea. But I've never had to do this before and I'm not sure what all to suggest to improve reliability. The first thing I can think of this is this idea of throwing out statistical anomalies. If the slope suddenly changes sharply then you know you've found a white section of the image that dips into the edge skewing (no pun intended) your results. So you'd want to throw that stuff out somehow.

But from a performance standpoint there are a number of optimizations you could make which may add up.

Namely, I'd change this snippet from your inner loop from this:

Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);

if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }

To this:

Color center = image.GetPixel(x, end_y);

if (IsWhite(center)) { /* no change to end_y */ }
else
{
    Color above = image.GetPixel(x, above_y);
    Color below = image.GetPixel(x, below_y);
    if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
    else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}

It's the same effect but should drastically reduce the number of calls to GetPixel.

Also consider putting the values that don't change into variables before the madness begins. Things like image.Height and image.Width have a slight overhead every time you call them. So store those values in your own variables before the loops begin. The thing I always tell myself when dealing with nested loops is to optimize everything inside the most inner loop at the expense of everything else.

Also... as Vinko Vrsalovic suggested, you may look at his GetPixel alternative for yet another boost in speed.

Steve Wortham
  • 21,740
  • 5
  • 68
  • 90
  • Because IsBlack == !IsWhite, the return value of IsBlack can be cached similarly and use for both the if statements. – strager Sep 17 '09 at 22:26
2

At first glance, your code looks overly naive. Which explains why it doesn't always work.

I like the approach Steve Wortham suggested, but it might run into problems if you have background images.

Another approach that often helps with images is to blur them first. If you blur your example image enough, each line of text will end up as a blurry smooth line. You then apply some sort of algorithm to basically do a regression analisys. There's lots of ways to do that, and lots of examples on the net.

Edge detection might be useful, or it might cause more problems that its worth.

By the way, a gaussian blur can be implemented very efficiently if you search hard enough for the code. Otherwise, I'm sure there's lots of libraries available. Haven't done much of that lately so don't have any links on hand. But a search for Image Processing library will get you good results.

I'm assuming you're enjoying the fun of solving this, so not much in actual implementation detalis here.

Toaomalkster
  • 115
  • 8
1

Measuring the angle of every line seems like overkill, especially given the performance of GetPixel.

I wonder if you would have better performance luck by looking for a white triangle in the upper-left or upper-right corner (depending on the slant direction) and measuring the angle of the hypotenuse. All text should follow the same angle on the page, and the upper-left corner of a page won't get tricked by the descenders or whitespace of content above it.

Another tip to consider: rather than blurring, work within a greatly-reduced resolution. That will give you both the smoother data you need, and fewer GetPixel calls.

For example, I made a blank page detection routine once in .NET for faxed TIFF files that simply resampled the entire page to a single pixel and tested the value for a threshold value of white.

richardtallent
  • 34,724
  • 14
  • 83
  • 123
1

What are your constraints in terms of time?

The Hough transform is a very effective mechanism for determining the skew angle of an image. It can be costly in time, but if you're going to use Gaussian blur, you're already burning a pile of CPU time. There are also other ways to accelerate the Hough transform that involve creative image sampling.

plinth
  • 48,267
  • 11
  • 78
  • 120
0

Your latest output is confusing me a little. When you superimposed the blue lines on the source image, did you offset it a bit? It looks like the blue lines are about 5 pixels above the centre of the text.

Not sure about that offset, but you definitely have a problem with the derived line "drifting" away at the wrong angle. It seems to have too strong a bias towards producing a horizontal line.

I wonder if increasing your mask window from 3 pixels (centre, one above, one below) to 5 might improve this (two above, two below). You'll also get this effect if you follow richardtallent's suggestion and resample the image smaller.

Toaomalkster
  • 115
  • 8
0

Very cool path finding application. I wonder if this other approach would help or hurt with your particular data set.

Assume a black and white image:

  • Project all black pixels to the right (EAST). This should give a result of a one dimensional array with a size of IMAGE_HEIGHT. Call the array CANVAS.
  • As you project all the pixels EAST, keep track numerically of how many pixels project into each bin of CANVAS.
  • Rotate the image an arbitrary number of degrees and re-project.
  • Pick the result that gives the highest peaks and lowest valleys for values in CANVAS.

I imagine this will not work well if in fact you have to account for a real -45 -> +45 degrees of tilt. If the actual number is smaller(?+/- 10 degrees), this might be a pretty good strategy. Once you have an intial result, you could consider re-running with a smaller increment of degrees to fine tune the answer. I might therefore try to write this with a function that accepted a float degree_tick as a parm so I could run both a coarse and fine pass (or a spectrum of coarseness or fineness) with the same code.

This might be computationally expensive. To optimize, you might consider selecting just a portion of the image to project-test-rotate-repeat on.

Perry Horwich
  • 2,798
  • 3
  • 23
  • 51