0

I have written the below.Net code to read text from an image:

The platform used to write code: Windows 10,Visual Studio 2015,tesseract-ocr-setup-4.00.00dev and tessnet2

 using System;
 using System.Collections.Generic;
 using System.Linq;
 using System.Text;
 using System.Threading.Tasks;
using tessnet2;
 using System.Drawing;
using System.Drawing.Drawing2D;
using System.Drawing.Imaging;
using System.IO;

namespace ConsoleApplication2
 {
       class Program
    {
    static void Main(string[] args)
    {
        var image = new Bitmap(@"D:\Python\download.jpg");
        var ocr = new Tesseract();
        ocr.Init(@"C:\Program Files (x86)\Tesseract-OCR\tessdata", "eng",false);
        var result = ocr.DoOCR(image, Rectangle.Empty);
        foreach (tessnet2.Word word in result)
        {
            Console.WriteLine(word.Text);
            File.AppendAllText(@"D:\Python\writefile.txt",word.Text);

        }
        Console.ReadLine();
    }
   }
}

I have both tried both CPU from "Any CPU" and X86. Tried changing the Target framework versions also from Project Properties.

However, I'm getting below error:

An unhandled exception of type 'System.IO.FileLoadException' occurred in 
mscorlib.dll

Additional information: Mixed mode assembly is built against version 
'v2.0.50727' 
of the runtime and cannot be loaded in the 4.0 runtime without additional 
  configuration information.

Edit: Just written this in my app.config to remove the error and it is now looks like as below:

  <?xml version="1.0" encoding="utf-8"?>
<configuration>
<startup useLegacyV2RuntimeActivationPolicy="true"> 

<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5"/>
 </startup>

Installed the NuGet by referring this: https://www.nuget.org/packages/NuGet.Tessnet2/

I'm not able to read the image. The image I have downloaded from one of the Google Image which has text in it.

HEre is the message I'm getting:

enter image description here

and when I checked in the path C:\Program Files (x86)\Tesseract-OCR\tessdata

this is how it looks like:

enter image description here

What am I doing wrong? How to fix this?

AskMe
  • 2,495
  • 8
  • 49
  • 102
  • Have you tried https://stackoverflow.com/questions/2455654/what-additional-configuration-is-necessary-to-reference-a-net-2-0-mixed-mode. It seems tessnet2 is built against .NET 2.0 (see http://www.pixel-technology.com/freeware/tessnet2/) – mortb Aug 15 '17 at 08:25
  • Yes. I referred this. However, it seems it's unable to read the "tessdata" from here C:\Program Files (x86)\Tesseract-OCR\tessdata" How to make this read possible? – AskMe Aug 15 '17 at 08:52
  • Have you checked to see if there is any file in `C:\Program Files (x86)\Tesseract-OCR\tessdata` maybe the folder or file is missing? – mortb Aug 15 '17 at 08:57
  • Thanks for the clue. I have modified the question. Looks like unicharset is missing..Any suggestions? From where can I get unicharset in this folder as I have installed one EXE (which has default these files/folders) ? – AskMe Aug 15 '17 at 09:08

1 Answers1

0

The issue is resolved: by downloading the LANG packages from here: https://github.com/tesseract-ocr/langdata

Which was missing previously.The most important thing for Tessnet2 work is to get the languages packages, get it here (https://github.com/tesseract-ocr/langdata) for the languages you want. For the sample, I use the English language.

Download the language and extract that to "..\Tesseract-OCR\tessdata" folder.

Note: Looks like by default the language package will not come in tessdata during installation.

Here is my modified version of code :

 using System;
 using System.Collections.Generic;
 using System.Linq;
 using System.Text;
 using System.Threading.Tasks;
 using tessnet2;
 using System.Drawing;
 using System.Drawing.Drawing2D;
 using System.Drawing.Imaging;
 using System.IO;

 namespace ConsoleApplication2
 {
class Program
{
    static void Main(string[] args)
    {
        var image = new Bitmap(@"D:\Python\download.jpg");
        tessnet2.Tesseract ocr = new tessnet2.Tesseract();
        ocr.Init(@"C:\Program Files (x86)\Tesseract-OCR\tessdata", "eng",false);
        List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty);
        foreach (tessnet2.Word word in result)
        {
            Console.WriteLine("{0} : {1}",word.Confidence,word.Text);

        }

        Console.Read();
    }

}
}

Cheers!!!

AskMe
  • 2,495
  • 8
  • 49
  • 102