0

I'd like to automatically clean up visible borders/shadows in scanned pages.

My idea for doing this is simple: detect a largest rectangle in the image in which all pixels are white or nearly white, then crop the image to that rectangle or floodfill the exterior with white.

I can write my own program for finding such a rectangle, but I'd prefer to use ImageMagick (which can also do the cropping or floodfilling), netpbm, or other utilities readily available for Linux and Cygwin.

Can they do this? How?

PS: I just found a very similar question. If the answer there works for me, this will be a duplicate.

Community
  • 1
  • 1
reinierpost
  • 8,425
  • 1
  • 38
  • 70
  • There is an auto-crop in Gimp. I do not know whether it is possible to use it in a script, but it generally works pretty well. – Alexis Wilke Apr 28 '14 at 05:57
  • Interesting article on [Morphology](http://www.imagemagick.org/Usage/morphology/) may be helpful. Can you post an example before & after image of what your expecting? – emcconville Apr 28 '14 at 13:44
  • @emcconville: See [this question](http://stackoverflow.com/questions/18069843/image-registration-page-border-of-photo-quasi-scan-of-book-what-algorithms-l), which is about the same problem, but asks a different question. – reinierpost Apr 29 '14 at 13:51
  • @Alexis Wilke: That appears to crop only completely blank material, like [pnmcrop](http://netpbm.sourceforge.net/doc/pnmcrop.html) - but I need to crop off shadows and borders as well. – reinierpost Apr 29 '14 at 14:01

2 Answers2

1

convert has filters that you can apply before doing the autocrop. I have an example here:

  http://www.alexiswilke.me/blog/learning-more-about-convert-imagemagick

So use something like:

convert <in-image> -level 20%,80%,1.0 <out-image>

This will make dark areas pitch black and white areas full white.

Next you want to compare the image line by line at the top to find how many lines to remove from the top. This is done with the compare tool (which you could also use to apply the "-level filters" while doing the compare, with the -fuzz for example.) I did not try closely, so I cannot give you the exact command line for that one...

  http://www.imagemagick.org/script/compare.php

Once the compare process done, you should have the number of lines at the top, the number of lines at the bottom, on the left and on the right (if they don't test columns, think about rotating the image 90%.)

Finally, you have the geometry and you can apply the crop:

convert <in-image> -crop <width>x<height>+<xpos>+<ypos> <final-image>

Update:

Thinking about it, the -level option of convert would work very well along the pnmcrop tool. That means you'd first do a convert, crop that converted image, search the location of the final image in the original, use that geometry to crop the original. A sinopsis would be something like this:

convert <original> -level 20%,80% <temp>
pnmcrop <temp>
compare <original> <temp>
convert <original> -crop ... <final>

Put that in a script and you've got your auto-crop with none pure colors around the image as mentioned.

Hmmm... Actually, the compare command would certainly work a lot better if we compare with the <temp> image.

convert <original> -level 20%,80% <tempA>
pnmcrop <tempA> <tempB>
compare <tempA> <tempB>
convert <original> -crop ... <final>

Not too sure about the exact pnmcrop and compare command line options, but think of it like this: <tempA> is written once by convert (1st line) then used to generate <tempB> and then we search <tempB> inside of <tempA> to get a position and size (geometry) that we finally reuse for the crop command (last convert.)

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
1

I do this (my question is the similar one you link to) with a combination of ImageMagick and LSD.

Your mileage may vary with different settings being tweaked (in fact, my algorithm runs through this whole process several times with different settings and at different resolutions until one is deemed "good enough"), but the general strategy I have is this:

  1. Convert image with ImageMagick to black-and-white (just black pixels and white pixels, not a grayscale) PGM image.
  2. Produce an EPS image from the PGM image of just the edges of the page, using LSD with some very extreme parameters.
  3. Store the rotation angle of the EPS, as detected by ImageMagick with deskew.
  4. Rotate the EPS with ImageMagick to make it straight. (My scanned images can be crooked.)
  5. Store the crop dimensions of the EPS using ImageMagick's trim.
  6. Take the original scan, and rotate it the rotation angle and crop it to the crop dimensions using ImageMagick.
  7. If needed, use ImageMagick morphology to remove specks from poor-quality scans.

All of the params I use are fairly arbitrary/specific to my use case, but this is the general approach. Good luck!

Community
  • 1
  • 1
JacobEvelyn
  • 3,901
  • 1
  • 40
  • 51