OpenCV for OCR: How to compute thresholding levels for gray image OCR

Question

I'm trying to prepare images for OCR, and so far here is what I've done using info from Extracting text OpenCV

From the resulting image I use the contours that have been filtered to make a mask as follow:

//this is the mask of all the text
Mat maskF = Mat::zeros(rgb.rows, rgb.cols, CV_8UC1);
// CV_FILLED fills the connected components found - CV_FILLED to fill
drawContours(maskF, letters, -1, Scalar(255), CV_FILLED);
cv::imwrite("noise2-Mask.png", maskF);

the resulting img is promising:

considering this was my original img:

Unfortunately running Tesseract on it yields some issues, I think the levels of gray you see between letters on words confuses tesseract - so, you're thinking yeah, lets do a binary transform, well that just misses the second half of the page, so I tried applying Otsu threshold as well but the text becomes to pixelated and characters lose their shape.

I tried CalcBlockMeanVariance from OpenCV Adaptive Threshold OCR but could not get it to compile (and I'm not certain I understand it all tbh) compile chokes on

res=1.0-res;
res=Img+res;

Anyhow, if anyone has any suggestions I'll appreciate it! Note that the fractions are rarely recognized by Tesseract but I'm writing a new training set that will hopefully improve the reco rate)

score 4 · Answer 1 · edited May 23 '17 at 12:00

Enhancing dynamic range and normalizing illumination

The point is to normalize background to seamless color first. There are many methods to do this. Here is what I have tried for your image:

create paper/ink cell table for the image (in the same manner as in the linked answer). So you select grid cell size big enough to distinct character features from background. For your image I choose 8x8 pixels. So divide the image into squares and compute the avg color and abs difference of color for each of them. Then mark saturated ones (small abs difference) and set them as paper or ink cells according to avg color in comparison to whole image avg color.

Now just process all lines of image and for each pixel just obtain the left and right paper cells. and linearly interpolate between those values. That should lead you to actual background color of that pixel so just substract it from image.

My C++ implementation for this looks like this:

color picture::normalize(int sz,bool _recolor,bool _sbstract)
    {
    struct _cell { color col; int a[4],da,_paper; _cell(){}; _cell(_cell& x){ *this=x; }; ~_cell(){}; _cell* operator = (const _cell *x) { *this=*x; return this; }; /*_cell* operator = (const _cell &x) { ...copy... return this; };*/ };
    int i,x,y,tx,ty,txs,tys,a0[4],a1[4],n,dmax;
    int x0,x1,y0,y1,q[4][4][2],qx[4],qy[4];
    color c;
    _cell **tab;
    // allocate grid table
    txs=xs/sz; tys=ys/sz; n=sz*sz; c.dd=0;
    if ((txs<2)||(tys<2)) return c;
    tab=new _cell*[tys]; for (ty=0;ty<tys;ty++) tab[ty]=new _cell[txs];
    // compute grid table
    for (y0=0,y1=sz,ty=0;ty<tys;ty++,y0=y1,y1+=sz)
     for (x0=0,x1=sz,tx=0;tx<txs;tx++,x0=x1,x1+=sz)
        {
        for (i=0;i<4;i++) a0[i]=0;
        for (y=y0;y<y1;y++)
         for (x=x0;x<x1;x++)
            {
            dec_color(a1,p[y][x],pf);
            for (i=0;i<4;i++) a0[i]+=a1[i];
            }
        for (i=0;i<4;i++) tab[ty][tx].a[i]=a0[i]/n;
        enc_color(tab[ty][tx].a,tab[ty][tx].col,pf);

        tab[ty][tx].da=0;
        for (i=0;i<4;i++) a0[i]=tab[ty][tx].a[i];
        for (y=y0;y<y1;y++)
         for (x=x0;x<x1;x++)
            {
            dec_color(a1,p[y][x],pf);
            for (i=0;i<4;i++) tab[ty][tx].da+=abs(a1[i]-a0[i]);
            }
        tab[ty][tx].da/=n;
        }
    // compute max safe delta dmax = avg(delta)
    for (dmax=0,ty=0;ty<tys;ty++)
     for (tx=0;tx<txs;tx++)
      dmax+=tab[ty][tx].da;
       dmax/=(txs*tys);

    // select paper cells and compute avg paper color
    for (i=0;i<4;i++) a0[i]=0; x0=0;
    for (ty=0;ty<tys;ty++)
     for (tx=0;tx<txs;tx++)
      if (tab[ty][tx].da<=dmax)
        {
        tab[ty][tx]._paper=1;
        for (i=0;i<4;i++) a0[i]+=tab[ty][tx].a[i]; x0++;
        }
      else tab[ty][tx]._paper=0;
    if (x0) for (i=0;i<4;i++) a0[i]/=x0;
    enc_color(a0,c,pf);
    // remove saturated ink cells from paper (small .da but wrong .a[])
    for (ty=1;ty<tys-1;ty++)
     for (tx=1;tx<txs-1;tx++)
      if (tab[ty][tx]._paper==1)
       if ((tab[ty][tx-1]._paper==0)
         ||(tab[ty][tx+1]._paper==0)
         ||(tab[ty-1][tx]._paper==0)
         ||(tab[ty+1][tx]._paper==0))
        {
        x=0; for (i=0;i<4;i++) x+=abs(tab[ty][tx].a[i]-a0[i]);
        if (x>dmax) tab[ty][tx]._paper=2;
        }
    for (ty=0;ty<tys;ty++)
     for (tx=0;tx<txs;tx++)
      if (tab[ty][tx]._paper==2)
       tab[ty][tx]._paper=0;

    // piecewise linear interpolation H-lines
    int ty0,ty1,tx0,tx1,d;
    if (_sbstract) for (i=0;i<4;i++) a0[i]=0;
    for (y=0;y<ys;y++)
        {
        ty=y/sz; if (ty>=tys) ty=tys-1;
        // first paper cell
        for (tx=0;(tx<txs)&&(!tab[ty][tx]._paper);tx++); tx1=tx;
        if (tx>=txs) continue; // no paper cell found
        for (;tx<txs;)
            {
            // fnext paper cell
            for (tx++;(tx<txs)&&(!tab[ty][tx]._paper);tx++);
            if (tx<txs)
                {
                tx0=tx1; x0=tx0*sz;
                tx1=tx;  x1=tx1*sz;
                d=x1-x0;
                }
            else x1=xs;

            // interpolate
            for (x=x0;x<x1;x++)
                {
                dec_color(a1,p[y][x],pf);
                for (i=0;i<4;i++) a1[i]-=tab[ty][tx0].a[i]+(((tab[ty][tx1].a[i]-tab[ty][tx0].a[i])*(x-x0))/d)-a0[i];
                if (pf==_pf_s   ) for (i=0;i<1;i++) clamp_s32(a1[i]);
                if (pf==_pf_u   ) for (i=0;i<1;i++) clamp_u32(a1[i]);
                if (pf==_pf_ss  ) for (i=0;i<2;i++) clamp_s16(a1[i]);
                if (pf==_pf_uu  ) for (i=0;i<2;i++) clamp_u16(a1[i]);
                if (pf==_pf_rgba) for (i=0;i<4;i++) clamp_u8 (a1[i]);
                enc_color(a1,p[y][x],pf);
                }
            }
        }

    // recolor paper cells with avg color (remove noise)
    if (_recolor)
     for (y0=0,y1=sz,ty=0;ty<tys;ty++,y0=y1,y1+=sz)
      for (x0=0,x1=sz,tx=0;tx<txs;tx++,x0=x1,x1+=sz)
       if (tab[ty][tx]._paper)
        for (y=y0;y<y1;y++)
         for (x=x0;x<x1;x++)
          p[y][x]=c;

    // free grid table
    for (ty=0;ty<tys;ty++) delete[] tab[ty]; delete[] tab;
    return c;
    }

See the linked answer for more details. Here result for your input image after switching to gray-scale <0,765> and using pic1.normalize(8,false,true);

Binarize

I tried naive simple range tresholding first so if all color channel values (R,G,B) are in range <min,max> it is recolored to c1 else to c0:
```
void picture::treshold_AND(int min,int max,int c0,int c1) // all channels tresholding: c1 <min,max>, c0 (-inf,min)+(max,+inf)
    {
    int x,y,i,a[4],e;
    for (y=0;y<ys;y++)
     for (x=0;x<xs;x++)
        {
        dec_color(a,p[y][x],pf);
        for (e=1,i=0;i<3;i++) if ((a[i]<min)||(a[i]>max)){ e=0; break; }
        if (e) for (i=0;i<4;i++) a[i]=c1;
         else  for (i=0;i<4;i++) a[i]=c0;
        enc_color(a,p[y][x],pf);
        }
    }
```
after applying pic1.treshold_AND(0,127,765,0); and converting back to RGBA I got this result:

The gray noise is due to JPEG compression (PNG would be too big). As you can see the result is more or less acceptable.

In case this is not enough you can divide your image into segments. Compute histogram for each segment (it should be bimodal) then find the color between the 2 maximums which is your treshold value. The problem is that the background covers much more area so the ink peak is relatively small and sometimes hard to spot in linear scales see full image histogram:

When you do this for each segment it will be much better (as there will be much less background/text color bleedings around the tresholds) so the gap will be more visible. Also do not forget to ignore the small gaps (missing vertical lines in the histogram) as they are just related to quantization/encoding/rounding (not all gray shades are present in the image) so you should filter out gaps smaller then few intensities replacing them with avg of last and next valid histogram entry.

I will try this out - haven't found a way to add this to my obj-C project but will do research to see how I can add a method to the picture object. — Xav, Sep 04 '16 at 00:26
@Xav In case you are using 3th party image lib it is not a good idea to mess with it directly. You you can write custom function outside image class that changes such image taken as operand instead. Yes my image is 2D array/matrix of pixels where pixel is 32bit unsigned int (`DWORD`) with supported encodings `_pf_rgba (4x8bit uint),_pf_u(1x32bit uint), _pf_s(1x32bit int), and more` the `dec_color/enc_color` just unpack/pack this to `DWORD[4]` array on per channel basis to make the code universal for any pixel-format. You can ignore all that as you got only grayscale. — Spektre, Sep 04 '16 at 07:21
@Xav so all the 3 and 4 iteration `for`s can disappear if your image is encoded as 1 channel gray-scale. Also you can improve this a lot if you compute booth horizontal and vertical lines and use the average of booth or use cubic interpolation, or interpolate the gaps in the table first. But as you see even simple piecewise linear interpolation is good. PS. that code was used for bi-cubic interpolation so there are many unused variables left like `q[4][4][4]` etc. (I forgot to erase) — Spektre, Sep 04 '16 at 07:29
@Spektre, I think it is a great solution. By the way, could you share a python project which will get the same result? — Gary Chen, Apr 26 '21 at 14:56
@XueQing no I can't as I do not code in Python nor Matlab ... — Spektre, Apr 26 '21 at 20:29

score 1 · Answer 2 · edited May 23 '17 at 12:00

1

You could try to erase all the text with an erosion and then subtract the result from the grayscale image or use a tophat transform instead to uniform your background for global thresholding! Here you can find an example code! And why not use the existing adaptiveThreshold function ?

edited May 23 '17 at 12:00

Community

1
1

answered Aug 31 '16 at 10:16

PSchn

718
4
14

score 1 · Answer 3 · answered Feb 06 '20 at 12:08

I've tried to maximize the distance between the minimum and the maximum color of one channel and then inverted the colors. (See Code)

img1 = cv2.imread('img.png')
img_reverted = img1.copy()

for i in range(3):
  tmp = (img1[:, :, i] - img1[:, :, i].min()) / img1[:, :, i].max()
  tmp = tmp * 255
  tmp = -1 * tmp + 255
  img_reverted[:, :, i] = tmp.astype('uint8');

The results are quite good. (See images)

Original image:

Reverted image:

OpenCV for OCR: How to compute thresholding levels for gray image OCR

3 Answers3

Linked