3

I am looking for library routines for the image enhancement of (scientific) plots and diagrams. Typical examples are shown in

http://www.jcheminf.com/content/pdf/1758-2946-4-11.pdf

and Figure 3 of http://en.wikipedia.org/wiki/Anti-aliasing

These have the features that:

  • They usually use a very small number of primitives (line, character, circle, rectangle)
  • They are usually monochrome (black/white) or have a very small number of block colours
  • The originals have no gradients or patterns.

I wish to reconstruct the primitives and am looking for an algorithm to restore clean lines in the image before the next stage of analysis (which may include line detection and OCR). The noise often comes from :

  • use of JPGs (the noise is often seen close to the original primitive)
  • antialiasing

I require Free/Open Source solutions and would ideally like existing Java libraries. If there are any which already do some of the job or reconstructing lines that would be a bonus! For characters recognition I would be happy to isolate each character at this stage and defer OCR, though pointers to that would also be appreciated.

UPDATE: I am surprised that even with a bounty there have been no substantive replies to the question. I am therefore investigating it myself. I still invite answers but they should go beyond my own answer.

peter.murray.rust
  • 37,407
  • 44
  • 153
  • 217

3 Answers3

4

ANSWER TO OWN QUESTION Since there there have been no answers after nearly a week here is what I now plan:

I found mention of the Canny edge-detection algorithm on another SO post and then found:

[http://www.tomgibara.com/computer-vision/canny-edge-detector][2]

from Tom Gibara.

This is very easy to use in default mode and the main program is:

    public static void main(String[] args) throws Exception {
             File file = new File("c.bmp");
     //create the detector
     CannyEdgeDetector detector = new CannyEdgeDetector();
     //adjust its parameters as desired
     detector.setLowThreshold(0.5f);
     detector.setHighThreshold(1f);
     //apply it to an image
     BufferedImage img = ImageIO.read(file);
     detector.setSourceImage(img);
     detector.process();
     BufferedImage edges = detector.getEdgesImage();
     ImageIO.write(edges, "png", new File("c.png"));
}

Here ImageIO reads and writes bitmaps. The unprocessed image is read as a 24-bit BMP (ImageIO seems to fail with lower colour range). The defaults are Gibara's out-of-the-box.

The edge detection is very impressive and outlines all the lines and characters. This bitmap

raw bit map

is converted to the edges

edges detected

So now I have two tasks:

  • fit straight lines to the outlines, which are essentially clean "tramlines". I expect this to be straightforward for clean diagrams. I'd be grateful for any mention of Java libraries to fit line primitives to outlines.
  • recognize the characters. Gibara has done an excellent job of separating them and so this is an exercise of recognising the individual glyphs. I can use the outlines to isolate the individual pixel maps for each glyph and then pass these to JavaOCR. Alternatively the outlines may be good enough to recognize the characters directly. I do NOT know what the font is, but most characters are in the 32-255 range and I believe I can build up heuristic maps.

See How do I properly load a BufferedImage in java? for loading bitmaps in Java

Community
  • 1
  • 1
peter.murray.rust
  • 37,407
  • 44
  • 153
  • 217
1

I wish to reconstruct the primitives and am looking for an algorithm to restore clean lines in the image before the next stage of analysis (which may include line detection and OCR).

Have you looked at jaitools? ( http://code.google.com/p/jaitools/ ).

They have API for vectorizing graphics which are quite fast and flexible; see API and docs here: http://jaitools.org/

LSerni
  • 55,617
  • 10
  • 65
  • 107
1

Java Library

OpenCV is the go-to library for computer vision tasks like this. There are Java bindings here: http://code.google.com/p/javacv/ . OpenCV covers everything from basic image processing filters to high-level object and motion detection algorithms.

Line Detection

For detecting straight lines, try the Hough Transform. The OpenCV Tutorials have a good explanation: http://opencv.itseez.com/doc/tutorials/imgproc/imgtrans/hough_lines/hough_lines.html#how-does-it-work

The classical Hough transform outputs infinite lines, but OpenCV also implements a variant called the Probabilistic Hough Transform that outputs line segments. It should give what you need. The original academic paper is here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.34.9440&rep=rep1&type=pdf

Once you detect line segments, you might want to detect linked line segments and join them together. For your simple images, you will probably do just fine with a brute-force comparison of all segment endpoints. If you detect more than one endpoint within a small radius, say 2 pixels, join them together to make sure your lines are continuous. You can also measure the angle between joined line segments to detect polygons.

Circle Detection

There is another version of the Hough transform that can detect circles, explained here: http://opencv.itseez.com/doc/tutorials/imgproc/imgtrans/hough_circle/hough_circle.html#hough-circle

japreiss
  • 11,111
  • 2
  • 40
  • 77
  • I have awarded this the bounty because although I had investigated the Hough transform I hadn't realised the power of the PHT. It looks ideal for what I need (though I haven't tried it - the answer only just came in) – peter.murray.rust Jul 12 '12 at 07:57