9

I need to convert an image to text. But it's a bit easier than it seems.

The image I'm talking about is not a scanned document or something that is rotated, skewed, and up-side-down. It's a clean screenshot from a game (similar to taking a screenshot of a some text in notepad). I also know exactly how big the text is and where it is, it is also very easy to remove the background and make black on white.

The font will always stay the same (however, I don't know which font). so maybe I could teach something it to read this specific font?

I also need this to be called from a c# application. so I'm looking for some way in c# to say: here's the Bitmap (or path to a bitmap), give me what it says in plain text.

I already tried that tesseract ocr, but it seems that I'm doing something wrong, because it is almost always wrong. The only one that had good results (only a small mistake with having an "at" becoming "a t") was Capture2Text, but I have no idea how to use that in c#.

here's a small sample of what it should be able to read: http://i.imgur.com/PdEGznk.png

Wolf
  • 9,679
  • 7
  • 62
  • 108
user2839747
  • 255
  • 4
  • 11
  • Check this this can help you... http://stackoverflow.com/a/21496107/3051661 – pankeel Feb 01 '14 at 11:29
  • Accurate OCR is an unsolved problem. Works reasonably on printed text, the kind that's rendered with 2400 dpi and scanned at no less than 600 dpi. Screenshots are 96 dpi, often intentionally blurry due to anti-aliasing. And less, your letters are only 6 pixels high. A program like Capture2Text was specifically engineered for screenshots so is likely to be tweaked enough to have a shot at it. Don't do it. – Hans Passant Feb 01 '14 at 15:21

4 Answers4

5

I use Tesseract.NET to recognize your sample image and got "Evorvze SWOYG"; after rescaling it to 300DPI, got "Bronze sword".

nguyenq
  • 8,212
  • 1
  • 16
  • 16
  • Tesseract Wiki ([FAQ](http://code.google.com/p/tesseract-ocr/wiki/FAQ), [ImproveQuality](http://code.google.com/p/tesseract-ocr/wiki/ImproveQuality)) recommends that resolution. – nguyenq Feb 01 '14 at 23:00
  • 300 dpi is the recommended setting in general for any OCR-related image processing. – Ilya Evdokimov Feb 04 '14 at 03:11
  • Updated links: [FAQ](https://tesseract-ocr.github.io/tessdoc/FAQ.html), [Improving quality](https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html) – dotjpg3141 Oct 21 '21 at 12:31
1

I just added that code of scaling the image two times bigger and it recognizes numbers perfectly!

Bitmap b = new Bitmap(width * 2, height * 2);
using (Graphics g1 = Graphics.FromImage((Image)b))
{
    g1.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
    g1.DrawImage(newBitmap, 0, 0, width * 2, height * 2);
}
ChrisF
  • 134,786
  • 31
  • 255
  • 325
G. Goncharov
  • 172
  • 1
  • 14
1

I actually strongly suggest not to re-sample up to 300 DPI it will create a lot of dithering. Which is no good for OCR. Some engines like Nuance and ABBYY are smart enough to know how to deal with fonts on 72 dpi images.

The OCR engine in the MODI library is using an old version of Nuance which will be substantially better than Capture2Text and Tesseract.

Chris Riley
  • 106
  • 4
0

The idea is anytime a new screenshot files appear in the folder run tesseract OCR on it and open in a file editor.

You can use below script in Linux or in Windows with WSL (Ubuntu on Windows)

You can leave this running script in the output directory of your favorite screen shot output directory

#cat wait_for_it.sh
inotifywait -m . -e create -e moved_to |
    while read path action file; do
        echo "The file '$file' appeared in directory '$path' via '$action'"
        cd "$path"
        if [ ${file: -4} == ".png" ]; then
                tesseract "$file" "$file"
                sleep 1
                gedit "$file".txt &
        fi

    done

You will need this to be istalled

sudo apt install tesseract-ocr
sudo apt install inotify-tools

I use it with Shutter on Ubuntu and with Greenshot on Windows with WSL(Ubuntu on Windows)

Eduard Florinescu
  • 16,747
  • 28
  • 113
  • 179