15

i'm using the tessnet2 wrapper to the Tesseract 2.04 Source on windows XP, configured it to work with x86.

TessarctTest project main function contains:

        Bitmap bmp = new Bitmap(@"C:\temp\New Folder\dotnet\eurotext.tif");
        tessnet2.Tesseract ocr = new tessnet2.Tesseract();
        // ocr.SetVariable("tessedit_char_whitelist", "0123456789");
        ocr.Init(@"C:\temp\tessdata", "eng", false);
        // List<tessnet2.Word> r1 = ocr.DoOCR(bmp, new Rectangle(792, 247, 130, 54));
        List<tessnet2.Word> r1 = ocr.DoOCR(bmp, Rectangle.Empty);
        int lc = tessnet2.Tesseract.LineCount(r1);

when i try to run the program it crashes on the following line inside the ocr.Init

int result = m_myTessBaseAPIInstance->InitWithLanguage((char *)_tessdata.ToPointer(), NULL, (char *)_lang.ToPointer(), NULL, numericMode, 0, NULL);

Any one has an idea?

Appreciate!

Jack
  • 667
  • 3
  • 9
  • 13
  • Is ocr class a wrapper for this DLL code which looks C/C++ to me? If so, then does the Init wrapper method parameters match up with the InitWithLanguage function's parameters? – t0mm13b Jan 13 '10 at 00:45
  • By the way, can you provide a bit more information please for us fellow SO'ers? Hints, clues graciously accepted... – t0mm13b Jan 13 '10 at 00:56
  • when i try to breakpoint inside the "InitWithLanguage" i can't... i get the message: "the breakpoint will not be hit. no executable code is associated with this line.Possible causes include: conditional compilation or compiler optimizations." – Jack Jan 13 '10 at 00:59
  • Is the tessnet2 a C++ DLL? If that is the case, then it is a native assembly built in Release mode hence you are not seeing any executable code. Are you using P/Invoke? – t0mm13b Jan 13 '10 at 01:12
  • I just googled tessnet2 and came across this site...http://www.pixel-technology.com/freeware/tessnet2/ There is a mention of memory leaks.. perhaps that could be a contributing factor? – t0mm13b Jan 13 '10 at 01:13
  • yes... i saw it... but i don't think that this is the problem in this case.. – Jack Jan 13 '10 at 01:19
  • How are you calling the function? Perhaps there's a configuration file or something prior to within InitWithLanguage - missing DLL or something? – t0mm13b Jan 13 '10 at 01:23
  • I was having a problem where my app was crashing on ocr.Init too. It had been working before I made some changes that should not have effected anything. I was calling it by passing null for the first value and the tessdata directory was in my Debug dir (default behavior). I had to point it at the tessdata in another directory to get it to work. Not sure why that worked, but you might want to download the tessdata again or move it to see if that helps. – juharr Apr 17 '10 at 22:46

8 Answers8

24

For anyone still having a problem after all these, make sure if you're using tessnet2 that you download the correct language files.

You want English language data for Tesseract (2.00 and up) and not the English language data for Tesseract 3.01 version. I hope this saves you a few hours! :)

Adam K Dean
  • 7,387
  • 10
  • 47
  • 68
  • 3
    Thanks, Adam, this fixed the problem for me as I was trying to use language files for version 3.0. Tessnet2 is based on Tesseract 2.0, of course. – Nikola Malešević May 29 '12 at 11:47
  • Indeed, it is such a simple mistake to make that I found myself looking over code and libraries and all sorts when it was quite simply, the wrong files. Glad it helped! – Adam K Dean Jun 29 '12 at 10:12
  • but what if you want to train it for your own font? should i still download the language data files and where should i put it? – Mr.Noob Jul 27 '12 at 09:17
  • this haven't worked for me. I don't know what to do anymore :( – Codemunkeee Dec 13 '13 at 08:22
15

For those attempting to use the Tessnet2 assembly for the Tesseract OCR engine in C# and who are running into the problem of the Tesseract.Init() method causing your app to crash - I found one possible cause.

First, I'm assuming you have the files as follows:

bin\Debug\MyDotNetApp.exe
bin\Debug\tessdata\eng.DangAmbigs
bin\Debug\tessdata\eng.freq-dawg
bin\Debug\tessdata\eng.inttemp
bin\Debug\tessdata\eng.pffmtable
bin\Debug\tessdata\eng.unicharset
bin\Debug\tessdata\eng.user-words
bin\Debug\tessdata\eeng.word-dawg

And are using this for the initialization:

using (var ocr = new tessnet2.Tesseract())
{
    ocr.Init(null, "eng", false);
    ...
}

In theory that should work. For me it did work - but then it didn't all of a sudden... even though I didn't change anything that would affect it.

For me the fix was to search through the registry (using regedit) and remove all references to tesseract. There were some suspicious entries that I think may have been created when I installed the Tesseract 3.00 installer (tesseract-ocr-setup-3.00.exe).

When I deleted those entries and rebooted (I had tried rebooting before removing the reg entries, FYI), everything worked again.

Were the registry entries causing the problem? Who knows. But it did fix my problem.

dkr88
  • 504
  • 1
  • 5
  • 15
  • dkr88, thank you for this answer, as it solved my problem. I did exactly what you posted and everything works now! – brozo Jun 23 '11 at 14:57
  • 1
    Did the same but still didn't work for me. Am confused right now. I have trained for my own font. uninstalled tesseract 3.01v and cleaned the registry with this registry cleaner software as well. Can someone tell me what files should i include in the tessdata folder? by the way I have trained tesseract for my own font. I'm just confused what should I be including in it. – Mr.Noob Jul 27 '12 at 14:12
  • Thanks for this. This solved my problem. I did not delete any registries I just uninstalled tesseract3 which cleaned registry after itself and reboot was required. – Marek Sep 27 '12 at 17:02
  • @dkr88 : Thanks a lot..I followed your code... List result = ocr.DoOCR(image, Rectangle.Empty); int lc = tessnet2.Tesseract.LineCount(result); MessageBox.Show(""+lc); string imgText = ""; foreach (tessnet2.Word word in result) { imgText = imgText + " " + word.Text.ToString().Trim(); } MessageBox.Show("Image Text is :" + imgText); But it displaying lc=1 and word.text=~.Actually,My image contains more than 10 lines of english content.So, How can extract those content from my image? – Saravanan Jun 19 '13 at 06:03
3

Project + Properties, Debug tab, scroll down, tick the "Enable unmanaged code debugging" checkbox. Now you can set a breakpoint and debug it.


If your IDE doesn't support mixed mode debugging, you can attach a debugger using the technique outlined in this post.

Community
  • 1
  • 1
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
1

Make sure your tessdata folder (C:\temp\tessdata) contains the english language data files. The files are: eng.DangAmbigs, eng.freq-dawg, eng.inttemp, eng.normproto, eng.pffmtable, eng.unicharset, eng.user-words, eng.word-dawg. download the files from tesseract downloads. The file to download is tesseract-2.00.eng.tar.gz.

mcdon
  • 4,931
  • 3
  • 38
  • 36
1

In my case the answer from dkr88 did the job, thanks a lot. I guess there some dependency corrupt when tesseract was installed as a standalone before. Furthermore, the OCR-quality seems to be better than with MODI although tiltcorrection os the latter is working under more extreme circumstances (vertical text).

I'm pretty happy with tessnet2 now. There is only one drawback: I needed to change my app.config (as described on the internet) and added the following:

<startup useLegacyV2RuntimeActivationPolicy="true">
    <supportedRuntime version="v4.0"/>
</startup>
B. Verhoeff
  • 143
  • 1
  • 6
1

My problem is that I wasn't running the application with Administrator permissions.

When I right clicked run as and chose Local Administrator it worked.

Nick
  • 11
  • 1
0

In my case, I did the below changes to get it work :)

  1. Downloaded https://tesseract-ocr.googlecode.com/files/tesseract-2.00.eng.tar.gz
  2. Pasted tessdata folder to my Debug folder
  3. And did the following code changes

ocr.Init("D:\MyApplication\MyApplication\Debug", "eng", false);

to

ocr.Init(null, "eng", false);

Palanikumar
  • 6,940
  • 4
  • 40
  • 51
0

In my case I set the tessdata files to copy always, and then it didn't crash on the init line.

JBAkroyd
  • 33
  • 6