23

I have two Numpy arrays (3-dimensional uint8) converted from PIL images.

I want to find if the first image contains the second image, and if so, find out the coordinates of the top-left pixel inside the first image where the match is.

Is there a way to do that purely in Numpy, in a fast enough way, rather than using (4! very slow) pure Python loops?

2D example:

a = numpy.array([
    [0, 1,  2,  3],
    [4, 5,  6,  7],
    [8, 9, 10, 11]
])
b = numpy.array([
    [2, 3],
    [6, 7]
])

How to do something like this?

position = a.find(b)

position would then be (0, 2).

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
Etienne Perot
  • 233
  • 1
  • 2
  • 5

5 Answers5

36

I'm doing this with OpenCV's matchTemplate function. There is an excellent python binding to OpenCV which uses numpy internally, so images are just numpy arrays. For example, let's assume you have a 100x100 pixel BGR file testimage.bmp. We take a 10x10 sub-image at position (30,30) and find it in the original.

import cv2
import numpy as np

image = cv2.imread("testimage.bmp")
template = image[30:40,30:40,:]

result = cv2.matchTemplate(image,template,cv2.TM_CCOEFF_NORMED)
print np.unravel_index(result.argmax(),result.shape)

Output:

(30, 30)

You can choose between several algorithms to match the template to the original, cv2.TM_CCOEFF_NORMED is just one of them. See the documentation for more details, some algorithms indicate matches as minima, others as maxima in the result array. A word of warning: OpenCV uses BGR channel order by default, so be careful, e.g. when you compare an image you loaded with cv2.imread to an image you converted from PIL to numpy. You can always use cv2.cvtColor to convert between formats.

To find all matches above a given threshold confidence, I use something along the lines of this to extract the matching coordinates from my result array:

match_indices = np.arange(result.size)[(result>confidence).flatten()]
np.unravel_index(match_indices,result.shape)

This gives a tuple of arrays of length 2, each of which is a matching coordinate.

PiQuer
  • 2,383
  • 25
  • 29
  • Much more complete answer, thanks~ Wish I could change the chosen answer but I can't – Etienne Perot Feb 14 '12 at 00:30
  • Just out of curiosity and not to steal some rep from tom10 ;), why can't you change the accepted answer? I am new to stackoverflow, but in my own first question which I posted it indicates that I can "toggle" the accepted answer, and other questions on meta.stackoverflow.com show that it *should* be possible to re-accept. – PiQuer Feb 14 '12 at 08:41
  • 1
    **edit**: Because the account I asked this question with is not the same as my current account. I lost control of the OpenID domain used to log in to that other account, so I can't log into it and change it either. – Etienne Perot Feb 14 '12 at 16:26
  • gotta say this is the most superior solution that I found on this topic. well worth investing the time in OpenCV. Thank you – Muppet Jun 17 '12 at 02:24
  • @PiQuer what does the line 'template = image[30:40,30:40,:]' do? – Mark Corrigan Dec 29 '14 at 01:55
  • @MarkCorrigan this is Python's [slicing notation](http://stackoverflow.com/a/509295). `template` will be a smaller patch of the original image, from `x=30` to `x=39` (`30:40` in the first coordinate), from `y=30` to `y=39` (`30:40` in the second coordinate) and all three color channels (`:` in the third coordinate). – PiQuer Jan 01 '15 at 13:15
  • Any ran negative tests for this solution? I tried using a totally different image to find it in the original image and it's still returning (x, y) coordinates. – Rod Maniego Apr 16 '20 at 11:49
10

This can be done using scipy's correlate2d and then using argmax to find the peak in the cross-correlation.

Here's a more complete explanation of the math and ideas, and some examples.

If you want to stay in pure Numpy and not even use scipy, or if the images are large, you'd probably be best using an FFT based approach to the cross-correlations.

Edit: The question specifically asked for a pure Numpy solution. But if you can use OpenCV, or other image processing tools, it's obviously easier to use one of these. An example of such is given by PiQuer below, which I'd recommend if you can use it.

tom10
  • 67,082
  • 10
  • 127
  • 137
4

I just finished writing a standalone implementation of normalized cross-correlation for N-dimensional arrays. You can get it from here.

Cross-correlation is calculated either directly, using scipy.ndimage.correlate, or in the frequency domain, using scipy.fftpack.fftn/ifftn depending on whichever will be quickest for the given input sizes.

ali_m
  • 71,714
  • 23
  • 223
  • 298
  • Sorry for the accidental downvote. (Mobile device.) If you edit the question I will undo my downvote. (Can't at the moment because it's locked in.) – funroll Nov 08 '13 at 23:48
3

You can actually reduce this problem to a simple string search using a regex like the following implementation - accepts two PIL.Image objects and finds coordinates of the needle within the haystack. This is about 127x faster than using a pixel-by-pixel search.

def subimg_location(haystack, needle):
    haystack = haystack.convert('RGB')
    needle   = needle.convert('RGB')

    haystack_str = haystack.tostring()
    needle_str   = needle.tostring()

    gap_size = (haystack.size[0] - needle.size[0]) * 3
    gap_regex = '.{' + str(gap_size) + '}'

    # Split b into needle.size[0] chunks
    chunk_size = needle.size[0] * 3
    split = [needle_str[i:i+chunk_size] for i in range(0, len(needle_str), chunk_size)]

    # Build regex
    regex = re.escape(split[0])
    for i in xrange(1, len(split)):
        regex += gap_regex + re.escape(split[i])

    p = re.compile(regex)
    m = p.search(haystack_str)

    if not m:
        return None

    x, _ = m.span()

    left = x % (haystack.size[0] * 3) / 3
    top  = x / haystack.size[0] / 3

    return (left, top)
Ben
  • 231
  • 1
  • 4
  • 13
  • Clever! And so lightweight compared to pulling in something like OpenCV. There is one issue with your code as is: any channel values of 10 outside the target will throw off the match because `.` doesn't match newlines by default. Fixed by prefixing the regex with `(?s)` or compiling with `re.DOTALL`. – dhaffey Jan 08 '18 at 18:48
0
import cv2
import numpy as np

img = cv2.imread("brows.PNG")              #main image
gray_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

template = cv2.imread("websearch.PNG", cv2.IMREAD_GRAYSCALE)      #subimage
w,h = template.shape[::-1]

result = cv2.matchTemplate(gray_img,template, cv2.TM_CCOEFF_NORMED)
loc = np.where(result >= 0.9)

for pt in zip(*loc[::-1]):
    cv2.rectangle(img, pt,(pt[0] + w,pt[1] +h), (0,255,0),3)

cv2.imshow("img",img)
cv2.waitKey(0)
cv2.destroyAllWindows()