Methods for detecting a known shape/object in an image using OpenCV

Question

My task is to detect an object in a given image using OpenCV (I do not care whether it is the Python or C++ implementation). The object, shown below in three examples, is a black rectangle with five white rectagles within. All dimensions are known.

However, the rotation, scale, distance, perspective, lighting conditions, camera focus/lens, and background of the image are not known. The edge of the black rectangle is not guaranteed to be fully visible, however there will not be anything in front of the five white rectangles ever - they will always be fully visible. The end goal is to be able to detect the presence of this object within an image, and rotate, scale, and crop to show the object with the perspective removed. I am fairly confident that I can adjust the image to crop to just the object, given its four corners. However I am not so confident that I can reliably find those four corners. In ambiguous cases, not finding the object is preferred to misidentifying some other feature of the image as the object.

Using OpenCV I have come up with the following methods, however I feel I might be missing something obvious. Are there any more methods available, or is one of these the optimal solution?

Edge based outline

First idea was to look for the outside edge of the object.

Using Canny edge detection (after scaling to known size, grayscaling and gaussian blurring), finding a contour which best matches the outer shape of the object. This deals with perspective, colour, size issues, but fails when there is a complicated background for example, or if there is something of similar shape to the object elsewhere in the image. Maybe this could be improved by a better set of rules for finding the correct contour - perhaps involving the five white rectangles as well as the outer edge.

Feature detection

The next idea was to match to a known template using feature detecting.

Using ORB feature detecting, descriptor matching and homography (from this tutorial) fails, I believe because the features it is detecting are very similar to other features within the object (lots of coreners which are precisely one-quarter white and three-quarters black). However, I do like the idea of matching to a known template - this idea makes sense to me. I suppose though that because the object is quite basic geometrically, it's likely to find a lot of false positives in the feature matching step.

Parallel Lines

Using Houghlines or HoughLinesP, looking for evenly spaced parallel lines. Have just started down this road so need to investigate the best methods for thresholding etc. While it looks messy for images with complex backgrounds, I think it may work well as I can rely on the fact that the white rectangles within the black object should always be high contrast, giving a good indication of where the lines are.

'Barcode Scan'

My final idea is to scan the image by line, looking for the white to black pattern.

I have not started this method, but the idea is to take a strip of the image (at some angle), convert to HSV colour space, and look for the regular black-to-white pattern appearing five times sequentially in the Value column. This idea sounds promising to me, as I believe it should ignore many of the unknown variables.

Thoughts

I have looked at a number of OpenCV tutorials, as well as SO questions such as this one, however because my object is quite geometrically simple I am having issues implementing the ideas given.

I feel like this is an achievable task, however my struggle is knowing which method to pursue further. I have experimented with the first two ideas quite a bit, and while I haven't achieved anything very reliable, maybe there is something I am missing. Is there a standard way of achieving this task which I have not thought of, or is one of my suggested methods the most sensible?

EDIT: Once the corners are found using one of the above methods (or some other method), I am thinking of using Hu Moments or OpenCV's matchShapes() function to remove any false positives.

EDIT2: Added some more input image examples as requested by @Timo

The barcode approach looks very interesting. Have you tried to combine different approaches? I would use multiple algorithms and combine them with majority voting or a maxima search. — Timo, Mar 26 '20 at 13:08
@Timo I did think about combining them yeah, then trying to evalute some kind of confidence value from each algorithm. My initial concern to the thought of that is that spending the time to fine-tune one algorithm is long enough, let alone multiple! But I do think that it probably makes sense for robustness, thanks! — rbv, Mar 26 '20 at 13:26
Another advantage of ensemble algorithms is, that you don't have to precisely fine tune each individual one because it's weighted against others and therefore not that impactful in itself. It's probably more effective to use more light weight and easy algorithms than a few complex and fine tuned ones. If you push this approach to the extremes (very simple, but a lot of algorithms or instances thereof) you will notice that that's the way many machine learning algorithms operate (disregarding the learning part). — Timo, Mar 26 '20 at 13:44
I also have another idea that might work well but I don't have the time to explain it atm. Could you add a few more input images (without any markers)? — Timo, Mar 26 '20 at 13:47
@Timo That makes sense re: multiple algorithms. I've added the images to the post. Just to note too, the three + marks on the object were intended for use as rotation/mirroring protection, for use once the chosen algorithm has idetified the four corners of the object and cropped. — rbv, Mar 26 '20 at 14:08
Check also shape based matching https://github.com/meiqua/shape_based_matching there are also paper links, — Andrey Smorodov, Mar 26 '20 at 17:21
Have you tried to detect the rectangle, detect the 4 corners, rectify the perspective (via a 4-point perspective unwarp) and then match the image against a template or a set of features? — stateMachine, Mar 27 '20 at 03:40
@eldesgraciado Thanks for the suggestion. My main concern with using the outer rectangle (as I did with the edge detection method) is that there is no guarantee that it will be fully visible. The five white rectangles will always be fully visible so I think that it's safer to look for them in the image. However, becuase I know the dimensions of the object, I can perhaps use the location of the five white rectangles to estimate the position of the four corners of the object, and then unwarp as you have suggested. Thanks! — rbv, Mar 27 '20 at 16:31
uncontrolled environment is hard to handle. You'll have to think about what kind of techniques are (mostly) invariant to what kind of deformations. E.g. edge detection is quite invariant to lighting conditions. The biggest shape distortion probably occurs from perspective distortion (if non-fisheye lens is used). If you find a way to undo the perspective effect you can probably go with stuff like chamfer matching. For uncontrolled environment, machine learning (deep learning detectors) probably are a good idea. — Micka, Apr 09 '20 at 07:38
for your specific object I would recommend thresholding techniques to get the black and white areas. Maybe you can have a look at aruco marker detection, QR code detection or checkerboard detection, how they decide about presence. — Micka, Apr 09 '20 at 07:40
Since you know your object is black, you can perform color thresholding with a black lower/upper range, then find contours and filter using aspect ratio + contour approximation. If the contour has a length of 4 then its a square/rectangle and you have detected your object. In addition you can use contour area filtering too — coffeewin, Apr 09 '20 at 20:58

score 5 · Answer 1 · answered Mar 26 '20 at 23:18

I had some time looking into the problem and made a little python script. I'm detecting the white rectangles inside your shape. Paste the code into a .py file and copy all input images in an input subfolder. The final result of the image is just a dummy atm and the script isn't complete yet. I'll try to continue it in the next couple of days. The script will create a debug subfolder where it'll save some images that show the current detection state.

import numpy as np
import cv2
import os

INPUT_DIR = 'input'
DEBUG_DIR = 'debug'
OUTPUT_DIR = 'output'
IMG_TARGET_SIZE = 1000

# each algorithm must return a rotated rect and a confidence value [0..1]: (((x, y), (w, h), angle), confidence)

def main():
    # a list of all used algorithms
    algorithms = [rectangle_detection] 

    # load and prepare images
    files = list(os.listdir(INPUT_DIR))
    images = [cv2.imread(os.path.join(INPUT_DIR, f), cv2.IMREAD_GRAYSCALE) for f in files]
    images = [scale_image(img) for img in images]

    for img, filename in zip(images, files):
        results = [alg(img, filename) for alg in algorithms]
        roi, confidence = merge_results(results)

        display = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
        display = cv2.drawContours(display, [cv2.boxPoints(roi).astype('int32')], -1, (0, 230, 0))            
        cv2.imshow('img', display)
        cv2.waitKey()


def merge_results(results):
    '''Merges all results into a single result.'''
    return max(results, key=lambda x: x[1]) 

def scale_image(img):    
    '''Scales the image so that the biggest side is IMG_TARGET_SIZE.'''
    scale = IMG_TARGET_SIZE / np.max(img.shape)
    return cv2.resize(img, (0,0), fx=scale, fy=scale)     


def rectangle_detection(img, filename):    
    debug_img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
    _, binarized = cv2.threshold(img, 50, 255, cv2.THRESH_BINARY)    
    contours, _ = cv2.findContours(binarized, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

    # detect all rectangles
    rois = []
    for contour in contours:
        if len(contour) < 4:
            continue
        cont_area = cv2.contourArea(contour)
        if not 1000 < cont_area < 15000: # roughly filter by the volume of the detected rectangles
            continue
        cont_perimeter = cv2.arcLength(contour, True)
        (x, y), (w, h), angle = rect = cv2.minAreaRect(contour)
        rect_area = w * h
        if cont_area / rect_area < 0.8: # check the 'rectangularity'
            continue        
        rois.append(rect)

    # save intermediate results in the debug folder
    rois_img = cv2.drawContours(debug_img, contours, -1, (0, 0, 230))
    rois_img = cv2.drawContours(rois_img, [cv2.boxPoints(rect).astype('int32') for rect in rois], -1, (0, 230, 0))
    save_dbg_img(rois_img, 'rectangle_detection', filename, 1)

    # todo: detect pattern

    return rois[0], 1.0 # dummy values


def save_dbg_img(img, folder, filename, index=0):
    '''Writes the given image to DEBUG_DIR/folder/filename_index.png.'''
    folder = os.path.join(DEBUG_DIR, folder)
    if not os.path.exists(folder):
        os.makedirs(folder)
    cv2.imwrite(os.path.join(folder, '{}_{:02}.png'.format(os.path.splitext(filename)[0], index)), img)


if __name__ == "__main__":
    main()

Here is an example image of the current WIP

The next step is to detect the pattern / relation between mutliple rectangles. I'll update this answer when I make progress.

Wow, that's amazing! Thank you for putting in so much work. I have started writing code for the barcode scanning method, and will think about how to determine a confidence value. But what you have sent already is working very well so I'm feeling much more confident about this task now — rbv, Mar 27 '20 at 15:55