2

I'm using OpenCV with CUDA / CUBLAS / CUFFT support to perform some very basic template matching on grayscale screenshots to extract some text, meaning I can assume the following:

I do know the exact size, color and rotation of the raw sprites I'm trying to match, i.e., I have access to the textures that were used in the observed program's rendering process.

However, since the sprites are partially transparent, normal template matching via cv::(gpu::)matchTemplate (using normed cross-correlation) does not work properly, as deviations in transparent regions have too much of a negative influence on the overall correlation.

Basically these examples summarize pretty well what I'm trying to achieve:

Given the template to be matched and its alpha mask:

Sprite template to be matched.Template alpha mask.

I'd like a high to near 100% match on images like these (arbitrary background, random stuff in transparent regions, partially occluded):

Sprite on white background.Sprite on black backgroundArbitrary background + transparent region partially occluded.Partially occluded.

However, images like these should only yield a very low percentage (wrong color, entirely black): enter image description hereenter image description here

Currently I'm using edge detection to get some decent matches (Canny + cross-correlation) but as you can see, depending on the background, edges may or may not be present in the image, which produces unreliable results and generally matches very "edgy" areas.

I've done some math to come up with an alpha-dependent normed cross-correlation (basically pre-multiplying the alpha mask to both the template and the image) which works fine on paper but is nearly impossible to implement with good performance. And yes, performance is indeed an issue, as multiple sprites (~10) have to be matched in near real-time (~10 FPS) to keep up with the program's speed.

I'm sort of running out of ideas here. Are there any standard approaches to this? Any ideas or suggestions?

Community
  • 1
  • 1
Opossum
  • 482
  • 4
  • 16

1 Answers1

4

So I finally managed to solve this myself using some math and CUFFT. I dubbed it "Alpha-weighted normed cross-correlation". Here it is:

Alpha-weighted normed cross-correlation

With this method I get very good matches (> 0.99) for cases like the ones in the OP. It further helps to threshold the alpha mask to simply ignore most transparent pixels.

Opossum
  • 482
  • 4
  • 16
  • Hey, this sounds very interesting. You got any implemented piece of code for this ? Also, how different is it from openCV's 4th algorithm (method=CV_TM_CCORR_NORMED) in this (http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html#which-are-the-matching-methods-available-in-opencv) link? Can you please explain? – Sai Nikhil Nov 09 '13 at 21:10
  • I haven't worked on this for over a year now, so pardon if I'm a little rusty on this. I was able to dig up the code for this though: [link](http://pastebin.com/eZDRpmDm). That should be the essential part of it I think. The main difference is that my version pre-multiplies template image and the respective part of the target image by the template's alpha value. This makes more transparent pixels contribute less to the matching process. You can see the difference when you compare C_{T,I} (which is CV_TM_CCORR_NORMED) with C_{alpha,T,I}. – Opossum Nov 10 '13 at 02:44
  • Thank you very much for the help. I have never used CUFFT, but I was using openCV till now. Can you please tell me what all additional header_files and sources are needed for successfully compiling the above code. Also, what is the trick in premultiplying the target image again withe same alpha. How will that help us really, as we are really losing some info there? I want to learn some concept (of FFT's in Image Processing) regarding this implementation. Please give me some good pointers. Thanks in advance. – Sai Nikhil Nov 10 '13 at 09:13
  • Omitting information is the point really. However, you don't pre-multiply the entire target image. You pre-multiply the current subimage that you are "sliding" your template image over. You would have to do this in a loop in the spatial domain. In the frequency domain you can use a simple component-wise product of the (complex-conjugate) template image and the subimage of the target (convolution theorem). Check out the Wikipedia for cross-correlation and convolution theorem. That's all you need. The only non-OpenCV header needed is cufft.h. You can initialize CUDA by simply creating a GpuMat. – Opossum Nov 10 '13 at 16:29
  • You look like an expert at this subject. Am just a beginner man. Please suggest me some good resource for learning these image processing stuff like FFTs on images etc., If you have an e-copy with you kindly mail it to tsnlegend at gmail dot com. Thanks in advance. – Sai Nikhil Nov 10 '13 at 20:31
  • I'm far from being an expert on this. I had a hard time understanding this post and code myself when I went back to it yesterday. Anyway, you should really try to understand the concept of convolution/[cross-correlation](http://en.wikipedia.org/wiki/Cross-correlation). Also check [this site](http://homepages.inf.ed.ac.uk/rbf/HIPR2/fourier.htm) for an introduction to FFT/DFTs. I found the best way to understand the formulas (like OpenCV's cross-correlation) is to evaluate a few examples manually for very small images, like a 2x2 texel template image in a 4x4 target image (without using FFT). – Opossum Nov 10 '13 at 21:11
  • The sad disadvantage of your code is that it requires an nVidia graphics processor. It will not work on an ordinary PC. – Elmue Sep 29 '14 at 13:28