6

I have been trying to look for an example of how to make a class/function that would attempt to read text from a the screen at specified coordinates.

Something simple that would use bitblt to capture the specified section of the screen and run tesseract on it. All done in memory without having to create image files to disk.

Tesseract seems to have really poor API and requires a TIF image of all things, as far as I can see it can't even be made to accept a bitmap memory image without extensive delving into its code.

Any help would be appreciated, an actual example would be ideal.

2 Answers2

6

http://i.imgur.com/HaJ2zOI.png enter image description here

Read on/view the below to see how to use Tesseract-OCR with images from memory..

#include <iostream>
#include <vector>
#include <stdexcept>
#include <fstream>
#include <memory>
#include <cstring>
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

#if defined _WIN32 || defined _WIN64
#include <windows.h>
#endif

class Image
{
    private:
        std::vector<std::uint8_t> Pixels;
        std::uint32_t width, height;
        std::uint16_t BitsPerPixel;

        void Flip(void* In, void* Out, int width, int height, unsigned int Bpp);

    public:
        #if defined _WIN32 || defined _WIN64
        explicit Image(HDC DC, int X, int Y, int Width, int Height);
        #endif

        inline std::uint16_t GetBitsPerPixel() {return this->BitsPerPixel;}
        inline std::uint16_t GetBytesPerPixel() {return this->BitsPerPixel / 8;}
        inline std::uint16_t GetBytesPerScanLine() {return (this->BitsPerPixel / 8) * this->width;}
        inline int GetWidth() const {return this->width;}
        inline int GetHeight() const {return this->height;}
        inline const std::uint8_t* GetPixels() {return this->Pixels.data();}
};

void Image::Flip(void* In, void* Out, int width, int height, unsigned int Bpp)
{
   unsigned long Chunk = (Bpp > 24 ? width * 4 : width * 3 + width % 4);
   unsigned char* Destination = static_cast<unsigned char*>(Out);
   unsigned char* Source = static_cast<unsigned char*>(In) + Chunk * (height - 1);

   while(Source != In)
   {
      std::memcpy(Destination, Source, Chunk);
      Destination += Chunk;
      Source -= Chunk;
   }
}

#if defined _WIN32 || defined _WIN64
Image::Image(HDC DC, int X, int Y, int Width, int Height) : Pixels(), width(Width), height(Height), BitsPerPixel(32)
{
    BITMAP Bmp = {0};
    HBITMAP hBmp = reinterpret_cast<HBITMAP>(GetCurrentObject(DC, OBJ_BITMAP));

    if (GetObject(hBmp, sizeof(BITMAP), &Bmp) == 0)
        throw std::runtime_error("BITMAP DC NOT FOUND.");

    RECT area = {X, Y, X + Width, Y + Height};
    HWND Window = WindowFromDC(DC);
    GetClientRect(Window, &area);

    HDC MemDC = GetDC(nullptr);
    HDC SDC = CreateCompatibleDC(MemDC);
    HBITMAP hSBmp = CreateCompatibleBitmap(MemDC, width, height);
    DeleteObject(SelectObject(SDC, hSBmp));

    BitBlt(SDC, 0, 0, width, height, DC, X, Y, SRCCOPY);
    unsigned int data_size = ((width * BitsPerPixel + 31) / 32) * 4 * height;
    std::vector<std::uint8_t> Data(data_size);
    this->Pixels.resize(data_size);

    BITMAPINFO Info = {sizeof(BITMAPINFOHEADER), static_cast<long>(width), static_cast<long>(height), 1, BitsPerPixel, BI_RGB, data_size, 0, 0, 0, 0};
    GetDIBits(SDC, hSBmp, 0, height, &Data[0], &Info, DIB_RGB_COLORS);
    this->Flip(&Data[0], &Pixels[0], width, height, BitsPerPixel);

    DeleteDC(SDC);
    DeleteObject(hSBmp);
    ReleaseDC(nullptr, MemDC);
}
#endif

int main()
{
    #if defined _WIN32 || defined _WIN64
    HWND SomeWindowHandle = GetDesktopWindow();
    HDC DC = GetDC(SomeWindowHandle);

    Image Img = Image(DC, 0, 0, 200, 200); //screenshot of 0, 0, 200, 200..

    ReleaseDC(SomeWindowHandle, DC);
    #else
    Image Img = Image(some_pixel_pointer, 200, 200); //pointer to pixels..
    #endif

    std::unique_ptr<tesseract::TessBaseAPI> tesseract_ptr(new tesseract::TessBaseAPI());

    tesseract_ptr->Init("/tesseract/tessdata', 'eng");
    tesseract_ptr->SetImage(Img.GetPixels(), Img.GetWidth(), Img.GetHeight(), Img.GetBytesPerPixel(), Img.GetBytesPerScanLine()); //Fixed this line..

    std::unique_ptr<char[]> utf8_text_ptr(tesseract_ptr->GetUTF8Text());

    std::cout<<utf8_text_ptr.get()<<"\n";

    return 0;
}
Brandon
  • 22,723
  • 11
  • 93
  • 186
  • Oh man.. I made an error in the post.. I fixed it. It's: `Img.GetBytesPerScanLine()` Not `Img.GetWidth * Img.GetBytesPerScanLine`. That was my fault for copying the code from the `Image class` to the main due to laziness.. I just had time to test it and it works. If it still gives you problems do: `SetImage(Img.GetPixels(), Img.GetWidth(), Img.GetHeight(), 4, Integer(Img.GetWidth() * 4))`. That's how I debugged it. Shouldn't have any problems though! The above code should work as is.. – Brandon Apr 08 '14 at 02:27
  • Believe me, I understand! After all, the library alone is frustrating to get working. It really is a pain. Glad it works though =) – Brandon Apr 08 '14 at 02:31
  • 1
    It printed fine for me. I've added a `Save` option to the `Image class`. You should now be able to save the image and see what your screenshot looks like. This will `let you see` what you're passing to tesseract-ocr. – Brandon Apr 08 '14 at 02:54
  • 1
    If the screenshot came out fine and everything else is fine, most likely tesseract does not recognize the characters or some other option needs setting.. It's been a while since I actually used tesseract (other than for testing this post).. Seeing as the code above works and that tesseract is getting the right images, the only thing I can think of is that you might have to use black text on a white background? Not too sure but that's what I tested it with. Anyway, I've got school so I'll check back in the morning and see how things go with you. I'll try to come up with something. – Brandon Apr 08 '14 at 03:03
  • 1
    If anything, you can make a thread explaining your problem that tesseract is not recognizing characters and see if anyone else has an idea until I get back from school to help debug it.. – Brandon Apr 08 '14 at 03:04
  • 1
    It does take a bit of debugging to get right. The guys at villavu (a pascal scripting site) use it pretty much everyday without problems iirc.. I'll test it on Firefox when I awake and I'll let you know how it goes. For now just try it on something like notepad and see if it recognizes your text. I'll add loading from file to the image class so you can test it on those. – Brandon Apr 08 '14 at 03:15
  • 1
    Well I added the ability to flip the image now.. Bitmaps by nature are stored upside down in memory. This should now flip it up-right. Should work now. – Brandon Apr 08 '14 at 17:54
2

You can do it like this on windows.

#include <tesseract/capi.h>
#include <windows.h>

void ReadFromScreen(RECT rc)
{
    HWND hWndDesktop = GetDesktopWindow();
    HDC hDC = GetDC(hWndDesktop);

#define BITS_PER_PIXEL   32
#define BYTES_PER_PIXEL  (BITS_PER_PIXEL / 8)
    int nWidth = rc.right - rc.left;
    int nHeight = rc.bottom - rc.top;
    BITMAPINFO bi;
    memset(&bi, 0, sizeof(bi));
    bi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
    bi.bmiHeader.biWidth = nWidth;
    bi.bmiHeader.biHeight = -nHeight;
    bi.bmiHeader.biPlanes = 1;
    bi.bmiHeader.biBitCount = BITS_PER_PIXEL;
    bi.bmiHeader.biCompression = BI_RGB;

    void* pixels;
    HBITMAP hBitmap = ::CreateDIBSection(0, &bi, DIB_RGB_COLORS, &pixels, NULL, 0);
    HDC hMemDC = CreateCompatibleDC(NULL);
    SelectObject(hMemDC, hBitmap);
    BitBlt(hMemDC, 0, 0, nWidth, nHeight, hDC, rc.left, rc.top, SRCCOPY);
    int nDataSize = nWidth * nHeight * BYTES_PER_PIXEL;
    TessBaseAPISetImage(pTessBaseAPI, (const unsigned char*)pixels, nWidth, nHeight, BYTES_PER_PIXEL, BYTES_PER_PIXEL * nWidth);
    if (TessBaseAPIRecognize(pTessBaseAPI, NULL) != 0)
    {
        return;
    }
    char* szText = TessBaseAPIGetUTF8Text(pTessBaseAPI);
    // Todo something with szText

    TessDeleteText(szText);
    DeleteObject(hBitmap);
    DeleteDC(hMemDC);
}
botao yang
  • 21
  • 2