iam using tesseract ocr for reading german png images in c++ and i got problems with some special characters like
ß ä ö ü and so on.
Do i need to train tesseract for reading this correct or what need to be done?
This is the part of the original image read by tesseract
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
UPDATE
SetConsoleOutputCP(1252);//changed to german.
SetConsoleCP(1252);//changed to german
wcout << "ÄÖÜ?ß" << endl;
// Open input image with leptonica library
Pix *image = pixRead("D:\\Images\\Document.png");
api->Init("D:\\TesseractBeispiele\\Tessaractbeispiel\\Tessaractbeispiel\\tessdata", "deu");
api->SetImage(image);
api->SetVariable("save_blob_choices", "T");
api->SetRectangle(1000, 3000, 9000, 9000);
api->Recognize(NULL);
// Get OCR result
wcout << api->GetUTF8Text());
After changing the Code below the Update the hard coded umlauts will be shown correctly, but the text from the image issnt correct, what do i need to change?
tesseract version is 3.0.2 leptonica version is 1.68