4

I've been using office document imaging for OCR to get the text from the image. For this image,

I'd like to know the preprocessing steps involved to improve the quality of the image before feeding it to the OCR. So far I've tried binarization (threshold), blur(Gaussian), sharpen, mean removal & increasing the brightness and contrast of the image, but still the OCR engine couldn't get the exact text (may be 50 % success).

I'd like to know the preprocessing steps (in right order) to improve the quality preferably in C#. The image of the screen is captured via a webcam. Thanks.

Questions
  • 195
  • 1
  • 14

2 Answers2

1

This image is of a very good quality for OCR. It will binarize seamlessly. Depending on the engine, you will perform the binarization yourself or let the engine do it.

Probably you have to blacken the bottom area so that characters get separated. As the screen layout is fixed, this can be easily automated.

You also need to check if this OCR knows about this font.

enter image description here

You can delimit the white areas by profile analysis (cumulating horizontally).

enter image description here

  • With your image, OCR could get the text almost perfectly, just that it couldn't recognize certain characters of the font. I was able to binarize it, but since the text may appear in multiple lines dynamically, I am not sure how to blacken the bottom area. Any way to automate it in this case? – Questions Dec 06 '15 at 07:28
  • 1
    @Questions I would first detect if horizontal line has any black-ish pixels if yes only then binarize it otherwise set it to black ... so you remember darkest and brightest color and if their intensity difference is high enough ... – Spektre Dec 06 '15 at 09:30
  • 1
    @Questions: the white areas aren't so difficult to locate, by accumulating the pixel values along horizontal. Then perform profile analysis. (See new picture.) –  Dec 06 '15 at 11:23
1

I have played with your image a bit in C++ with my DIP lib and here is the result:

picture pic0,pic1;
pic0.load("ocr_green.png");
pic0.pixel_format(_pf_u);       // RGB -> Grayscale <0-765>
pic0.enhance_range();           // remove DC offset and use full dynamic range <0-765>
pic0.normalize(8,false);        // try to normalize ilumination conditions of image (equalize light) based on 8x8 sqares analysis, do not recolor saturated square with avg color
pic0.enhance_range();           // remove DC offset and use full dynamic range <0-765>
pic1=pic0;                      // copy result to pic1
pic0.pixel_format(_pf_rgba);    // Grayscale -> RGBA
int x,y,c,c0,c1;
for (y=0;y<pic1.ys;y++)         // process all H lines
    {
    c0=pic1.p[y][0].dd; c1=c0;  // find min and max intensity in H line
    for (x=0;x<pic1.xs;x++)
        {
        c=pic1.p[y][x].dd;
        if (c0>c) c0=c;
        if (c1<c) c1=c;
        }
    if (c1-c0<700)              // if difference not big enough blacken H line...
     for (x=0;x<pic1.xs;x++) pic1.p[y][x].dd=0;
    else                        // else binarize H line
     for (x=0;x<pic1.xs;x++)
      if (pic1.p[y][x].dd>=155) pic1.p[y][x].dd=765; else pic1.p[y][x].dd=0;
    }
pic1.pixel_format(_pf_rgba);    // Grayscale -> RGBA

example

The left image (pic0) is just yours converted to grayscale, enhanced dynamic range to max and equalized illumination.

The right image (pic1) is binarized but only for horizontal lines with high enough change on pixel intensities (as mentioned in my comment)... the rest is set to black...

Community
  • 1
  • 1
Spektre
  • 49,595
  • 11
  • 110
  • 380
  • I tried implementing your code, since I am using a byte pointer to access each color component, am not sure how to use the values 0-765. Could you please explain that? I assume it's the sum of all 3 color components. Is that right? – Questions Dec 07 '15 at 12:33
  • @Questions what pixel format you have? if you got RGB `24/32 bit` then each **BYTE** is `R,G,B=<0,255>` ... when I use gray-scale I just sum the `R,G,B` together getting `I=R+G+B=<0,3*255=765>` instead to simplify things... If you want to go back to RGB then just `R=G=B=I/3;` that is what the `pixel_format` does anyway each pixel of mine is union of `{ DWORD dd; DWORD dw[2]; BYTE db[4]; }` so I can easily access pixel as `32bit, 2x16bit or 4x8bit` values which coresponds to full color, partial derivations after derive and r,g,b,a components – Spektre Dec 07 '15 at 12:41
  • I am using RGB32. And I converted your code to c# and it looks like this, but it blackens out the entire image. http://pastebin.com/EZvepnFD – Questions Dec 07 '15 at 12:56
  • @Questions the source of yours looks OK, but as you did not apply the Enhancing dynamic range and normalizing illumination step then your tresholds will be different you need to play with the values `700` and `155` until you got the desired output. It is possible you will get noise on the right side of image as the lighting conditions are different there – Spektre Dec 07 '15 at 13:44
  • @Questions try `520` instead of `700` and leave the `155` as is – Spektre Dec 07 '15 at 13:48
  • @Spektre still no luck. I am currently use the grayscale image and apply your code directly and not using other functions that you use. – Questions Dec 07 '15 at 14:22
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/97207/discussion-between-questions-and-spektre). – Questions Dec 07 '15 at 15:10
  • @Questions so how are you progressing? (btw. the chat will not notiffy you on new messages you need to look into it manually...) – Spektre Dec 09 '15 at 08:55
  • I am sorry for delay in update. With auto-adjusting contrast the quality and output of the OCR is better. But it doesn't work the same across different images as they are cam images and taken under different lights. And I am not sure of a generic implementation to ,make it work for all images. I am still trying. – Questions Dec 10 '15 at 19:03
  • @Questions that is why I combine enhance range and normalize illumination ... it puts the tresholds near the same value for different images . If you want to avoid that the only option I can think of is adaptive tresholding (change the value programaticaly on the run until the output is the best) but for that you need to encode a function that will estimate the quality of the result ... – Spektre Dec 10 '15 at 19:29
  • @Spektre May I know how do I implement enhance range and normalization? Any pointers would be helpful. – Questions Dec 10 '15 at 19:42