Well I am not able to get good accuracy of text detection in tesseract. Please check code and image below.
Mat imgInput = CvInvoke.Imread(@"D:\workspace\raw2\IMG_20200625_194541.jpg", ImreadModes.AnyColor);
int kernel_size = 11;
//Dilation
Mat imgDilatedEdges = new Mat();
CvInvoke.Dilate(
imgInput,
imgDilatedEdges,
CvInvoke.GetStructuringElement(
ElementShape.Rectangle,
new Size(kernel_size, kernel_size),
new Point(1, 1)),
new Point(1, 1),
1,
BorderType.Default,
new MCvScalar(0));
//Blur
Mat imgBlur = new Mat();
CvInvoke.MedianBlur(imgDilatedEdges, imgBlur, kernel_size);
//Abs diff
Mat imgAbsDiff = new Mat();
CvInvoke.AbsDiff(imgInput, imgBlur, imgAbsDiff);
Mat imgNorm = imgAbsDiff;
//Normalize
CvInvoke.Normalize(imgAbsDiff, imgNorm, 0, 255, NormType.MinMax, DepthType.Default);
Mat imgThreshhold = new Mat();
//getting threshhold value
double thresholdval = CvInvoke.Threshold(imgAbsDiff, imgThreshhold, 230, 0, ThresholdType.Trunc);
//Normalize
CvInvoke.Normalize(imgThreshhold, imgThreshhold, 0, 255, NormType.MinMax, DepthType.Default);
imgThreshhold.Save(@"D:\workspace\ocr_images\IMG_20200625_194541.jpg");
//contrast correction
Mat lab = new Mat();
CvInvoke.CvtColor(imgThreshhold, lab, ColorConversion.Bgr2Lab);
VectorOfMat colorChannelB = new VectorOfMat();
CvInvoke.Split(lab, colorChannelB);
CvInvoke.CLAHE(colorChannelB[0], 3.0, new Size(12, 12), colorChannelB[0]);
Mat clahe = new Mat();
//merge
CvInvoke.Merge(colorChannelB, clahe);
Image<Bgr, byte> output = new Image<Bgr, byte>(@"D:\workspace\ocr_images\IMG_20200625_194541.jpg");
Bitmap bmp = output.ToBitmap();
//setting image to 300 dpi since tesseract likes that
bmp.SetResolution(300, 300);
bmp.Save(@"D:\workspace\ocr_images\IMG_20200625_194541.jpg");
I am not getting expected accuracy. Please check how image is converted.
I have posted few images above that you can refer. For first image i am getting garbage data. For last two images i am getting partial data. Converting image to gray scale and playing with threshold gives better output.
I want to understand that if in case threshold is the key part then how i will be able to get dynamic threshhold value for each new image? It is going to work as service so user will simply pass the image and get the result. My app should be intelligent enough to process and understand image.
Do i have to adjust contrast, threshold more accurately? If yes how i will do that? or image itself is faulty I mean noise causing problem.
Please let me know what i am doing wrong in the algorithm or anything which will help me to understand issue. Any one who is aware of please tell me what should be ideal steps for image preprocessing for OCR?
I am using csharp, emucv and tesseract. Any suggestion will be highly appreciated.