I am using Tesseract version 3.0.2.0 and below is my code
string tessDataDir = @"D:\temp";
string ocrOutput= "";
using (var engine = new TesseractEngine(tessDataDir, "eng", EngineMode.Default))
{
engine.DefaultPageSegMode = PageSegMode.SingleChar;
using (var image = Pix.LoadFromFile(imagePath))
{
using (var page = engine.Process(image))
{
ocrOutput = page.GetText();
}
}
}
I am getting lots of incorrect characters, sometimes X is being detected as "J" sometimes as "fi", etc.
1) Below JPEG image is being detected as "L" though it is "X", can anyone tell me why it is so?
2) Also how can I disable dictionary use in Tesseract? Thanks.