Screen Region Recognition to Find Field location on Screen

Question

I am trying to figure out a way of getting Sikuli's image recognition to use within C#. I don't want to use Sikuli itself because its scripting language is a little slow, and because I really don't want to introduce a java bridge in the middle of my .NET C# app.

So, I have a bitmap which represents an area of my screen (I will call this region BUTTON1). The screen layout may have changed slightly, or the screen may have been moved on the desktop -- so I can't use a direct position. I have to first find where the current position of BUTTON1 is within the live screen. (I tried to post pictures of this, but I guess I can't because I am a new user... I hope the description makes it clear...)

I think that Sikuli is using OpenCV under the covers. Since it is open source, I guess I could reverse engineer it, and figure out how to do what they are doing in OpenCV, implementing it in Emgu.CV instead -- but my Java isn't very strong.

I looked for examples showing this, but all of the examples are either extremely simple (ie, how to recognize a stop sign) or very complex (ie how to do facial recognition)... and maybe I am just dense, but I can't seem to make the jump in logic of how to do this.

Also I worry that all of the various image manipulation routines are actually processor intensive, and I really want this as lightweight as possible (in reality I might have lots of buttons and fields I am trying to find on a screen...)

So, the way I am thinking about doing this instead is:

A) Convert the bitmaps to byte arrays and do brute force search. (I know how to do that part). And then

B) Use the byte array position that I found to calculate its screen position (I'm really not completely sure how I do this) instead of using the image processing stuff.

Is that completely crazy? Does anyone have a simple example of how one could use Aforge.Net or Emgu.CV to do this? (Or how to flesh out step B above...?)

Thanks!

+1 Awesome question - I hope someone will come along with a good answer! — Charles, May 26 '11 at 15:12

score 1 · Accepted Answer · answered Jul 23 '11 at 20:37

Generally speaking, it sounds like you want basic object recognition. I don't have any experience with SIKULI, but there are a number of ways to do object recognition (Edge based template matching, etc.). That being said you might be able to go with just straight histogram matching.

http://www.codeproject.com/KB/GDI-plus/Image_Processing_Lab.aspx

That page should show you how to use AForge.net to get the histogram of an image. You would just do a brute force search using something like this:

Bitmap ImageSearchingWithin=new Bitmap("Location of image"); //or just load from a screenshot or whatever
for (int x = 0; x < ImageSearchingWithin.Width - WidthOfImageSearchingFor; ++x)
{
    for (int y = 0; y < ImageSearchingWithin.Height - HeightOfImageSearchingFor; ++y)
    {
        Bitmap MySmallViewOfImage = ImageSearchingWithin.Clone(new Rectangle(x, y, WidthOfImageSearchingFor, HeightOfImageSearchingFor), System.Drawing.Imaging.PixelFormat.Format24bppRgb);
    }
}

And then compare the newly created bitmap's histogram to the one that you calculated of the original image (whatever area is the closest in terms of matching is what you would select as being the region of BUTTON1). It's not the most elegant solution but it might work for your needs. Otherwise you get into more difficult techniques (of course I could be forgetting something at the moment that might be simpler).

I'm going to accept this answer, only because I didn't see anything else of value. To be honest, I abandoned this when I found out that the OCR library underneath Sikuli has some memory leak problems. The histogram would be the way to go, but there is a lot of work left to get there from here... — MarkJoel60, Aug 22 '11 at 20:44

Screen Region Recognition to Find Field location on Screen

1 Answers1