0

If I have an image, in which there is a page of text shot on a uniform background, how can I auto detect the boundaries between the paper and the background?

An example of the image I want to detect is shown below. The images that I will be dealing with consist of a single page on a uniform background and they can be rotated at any angle.

enter image description here

rayryeng
  • 102,964
  • 22
  • 184
  • 193
Q_1.0
  • 183
  • 9
  • 3
    StackOverflow is not a code writing service. What have you tried so far? – sco1 May 27 '15 at 15:56
  • Im just asking for any methods that will work. – Q_1.0 May 27 '15 at 16:00
  • 2
    I think you'll need to be a little more specific about the problem constraints. Can the page be rotated? Can it be any size? Can there be multiple pages in one image? Or maybe other rectangular objects in the image as well? – eigenchris May 27 '15 at 16:12
  • 1
    Yeah this question is too unconstrained. Agree with @eigenchris on the types of scenarios that you can encounter. – rayryeng May 27 '15 at 16:24
  • So only one page is present in a uniform background. The page can be rotated. – Q_1.0 May 27 '15 at 16:27
  • and it is always white? cast scanlines as a grid from all sides of border and stop on hitting white-ish enough area (color,number of pixels,intensity) remember the points, then regress 4 border lines from them compute intersections and that is it all ... – Spektre May 27 '15 at 16:41
  • It is easily solved by contrasting the image and finding location extrema on the matrix... should work on the 90% of the cases... If you have a clear background, you can delete it easily.... – Brethlosze May 27 '15 at 17:36
  • @hypfco - The method you suggested won't work on this image. The bottom of the image has a different contrast then the top, so applying the same contrast transform and thresholding (finding the extrema location as you say) will lose some of the paper at the bottom while retaining the top. Some sort of adaptive thresholding may be useful here, but it's more than just a simple contrast enhancement. – rayryeng May 28 '15 at 02:37
  • @rayryeng I am truly quite sure a global contrast would be enough for this particular case. Though a local contrast -i.e. based on local hystograms- should be the natural extension. – Brethlosze May 28 '15 at 17:07
  • @hypfco I'd like to see if a global threshold would work. I'm not convinced with your assertion. As such, I encourage that you post an answer to prove me wrong. – rayryeng May 28 '15 at 17:35
  • @rayryeng I knew you would say that, but i'm not on the gladiator mood... :).. That should be rightly answer the question. The problem is not really a hard one... – Brethlosze May 28 '15 at 17:40
  • @hypfco well you can't really say what you say unless you can prove it :P. I beg to differ that it's not a hard problem but we will leave it at that. Good luck! – rayryeng May 28 '15 at 17:41
  • @rayryeng LOL you already had the question answered!... i agree with leaving it at that.... – Brethlosze May 28 '15 at 17:55

1 Answers1

5

One simple method would be to threshold the image by some known value once you convert the image to grayscale. The problem with that approach is that we are applying a global threshold and so some of the paper at the bottom of the image will be lost if you make the threshold too high. If you make the threshold too low, then you'll certainly get the paper, but you'll include a lot of the background pixels too and it will probably be difficult to remove those pixels with post-processing.

One thing I can suggest is to use an adaptive threshold algorithm. An algorithm that has worked for me in the past is the Bradley-Roth adaptive thresholding algorithm. You can read up about it here on a post I commented on a while back:

Bradley Adaptive Thresholding -- Confused (questions)

However, if you want the gist of it, an integral image of the grayscale version of the image is taken first. The integral image is important because it allows you to calculate the sum of pixels within a window in O(1) complexity. However, the calculation of the integral image is usually O(n^2), but you only have to do that once. With the integral image, you scan neighbourhoods of pixels of size s x s and you check to see if the average intensity is less than t% of the actual average within this s x s window then this is pixel classified as the background. If it's larger, then it's classified as being part of the foreground. This is adaptive because the thresholding is done using local pixel neighbourhoods rather than using a global threshold.

I've coded an implementation of the Bradley-Roth algorithm here for you. The default parameters for the algorithm are s being 1/8th of the width of the image and t being 15%. Therefore, you can just call it this way to invoke the default parameters:

out = adaptiveThreshold(im);

im is the input image and out is a binary image that denotes what belongs to foreground (logical true) or background (logical false). You can play around with the second and third input parameters: s being the size of the thresholding window and t the percentage we talked about above and can call the function like so:

out = adaptiveThreshold(im, s, t);

Therefore, the code for the algorithm looks like this:

function [out] = adaptiveThreshold(im, s, t)

%// Error checking of the input
%// Default value for s is 1/8th the width of the image
%// Must make sure that this is a whole number
if nargin <= 1, s = round(size(im,2) / 8); end

%// Default value for t is 15
%// t is used to determine whether the current pixel is t% lower than the
%// average in the particular neighbourhood
if nargin <= 2, t = 15; end

%// Too few or too many arguments?
if nargin == 0, error('Too few arguments'); end
if nargin >= 4, error('Too many arguments'); end

%// Convert to grayscale if necessary then cast to double to ensure no
%// saturation
if size(im, 3) == 3
    im = double(rgb2gray(im));
elseif size(im, 3) == 1
    im = double(im);
else
    error('Incompatible image: Must be a colour or grayscale image');
end

%// Compute integral image
intImage = cumsum(cumsum(im, 2), 1);

%// Define grid of points
[rows, cols] = size(im);
[X,Y] = meshgrid(1:cols, 1:rows);

%// Ensure s is even so that we are able to index the image properly
s = s + mod(s,2);

%// Access the four corners of each neighbourhood
x1 = X - s/2; x2 = X + s/2;
y1 = Y - s/2; y2 = Y + s/2;

%// Ensure no co-ordinates are out of bounds
x1(x1 < 1) = 1;
x2(x2 > cols) = cols;
y1(y1 < 1) = 1;
y2(y2 > rows) = rows;

%// Count how many pixels there are in each neighbourhood
count = (x2 - x1) .* (y2 - y1);

%// Compute row and column co-ordinates to access each corner of the
%// neighbourhood for the integral image
f1_x = x2; f1_y = y2;
f2_x = x2; f2_y = y1 - 1; f2_y(f2_y < 1) = 1;
f3_x = x1 - 1; f3_x(f3_x < 1) = 1; f3_y = y2;
f4_x = f3_x; f4_y = f2_y;

%// Compute 1D linear indices for each of the corners
ind_f1 = sub2ind([rows cols], f1_y, f1_x);
ind_f2 = sub2ind([rows cols], f2_y, f2_x);
ind_f3 = sub2ind([rows cols], f3_y, f3_x);
ind_f4 = sub2ind([rows cols], f4_y, f4_x);

%// Calculate the areas for each of the neighbourhoods
sums = intImage(ind_f1) - intImage(ind_f2) - intImage(ind_f3) + ...
    intImage(ind_f4);

%// Determine whether the summed area surpasses a threshold
%// Set this output to 0 if it doesn't
locs = (im .* count) <= (sums * (100 - t) / 100);
out = true(size(im));
out(locs) = false;

end

When I use your image and I set s = 500 and t = 5, here's the code and this is the image I get:

im = imread('https://i.stack.imgur.com/MEcaz.jpg');
out = adaptiveThreshold(im, 500, 5);
imshow(out);

enter image description here

You can see that there are some spurious white pixels at the bottom white of the image, and there are some holes we need to fill in inside the paper. As such, let's use some morphology and declare a structuring element that's a 15 x 15 square, perform an opening to remove the noisy pixels, then fill in the holes when we're done:

se = strel('square', 15);
out = imopen(out, se);
out = imfill(out, 'holes');
imshow(out);

This is what I get after all of that:

enter image description here

Not bad eh? Now if you really want to see what the image looks like with the paper segmented, we can use this mask and multiply it with the original image. This way, any pixels that belong to the paper are kept while those that belong to the background go away:

out_colour = bsxfun(@times, im, uint8(out));
imshow(out_colour);

We get this:

enter image description here

You'll have to play around with the parameters until it works for you, but the above parameters were the ones I used to get it working for the particular page you showed us. Image processing is all about trial and error, and putting processing steps in the right sequence until you get something good enough for your purposes.


Happy image filtering!

Community
  • 1
  • 1
rayryeng
  • 102,964
  • 22
  • 184
  • 193
  • @roni - Thank you :) :) :) – rayryeng May 28 '15 at 06:56
  • Great writeup, I guess SO is a code writing service :p – sco1 May 28 '15 at 11:54
  • @excaza lol. The problem was an interesting one... Which is why I wrote something up. If it didn't interest me at all I would have voted to close. I actually wasn't gonna touch this one until I tried a bit of work myself and was surprised with the results I got... Figured I'd share them with everyone! – rayryeng May 28 '15 at 12:00