Quickly compute average color of all the clusters generated by connectedComponents

Question

I split an image into clusters with cv2.connectedComponentWithStats. Now I want to observe color changes in each of the clusters.

Currently my code looks like this:

#...Calculation of the mask

res,labels,stats,centroids = cv2.connectedComponentsWithStats(im_mask)

def compute_average(frame,i):
   data=frame[labels==i].mean(axis=0)
   return (data[2]-data[1]) # difference between red and green channel is meaningful for me

while True:
   frame=capture.read()
   if(not frame[0]):
      break
   start_time=time.time()
   measurements = [ compute_average(frame,i) for i in range(1,len(centroids))]
   print("Computing took",time.time()-start_time

It appears that calculating measurements took almost 1.5 second for each frame (I have approximately 300 clusters of 200-600 pixels each). This is unacceptable.

It seems that by cleverly choosing numpy algorithm for computing average I can get much better performance. In particular it should be possible to compute average for all clusters simultaneously. However, I'm stuck here.

Is there a way to sort pixels of image into groups according to their label?

The bottleneck is more or less obvious: 300 passes over 1920x1080 array instead just single pass to compute all averages. — begemotv2718, Jan 03 '20 at 14:16
@Divakar I bet that it's the 300 passes over the label array to start with. It would be easy to write a single pass algo in C++. Python is gonna be trickier. — Dan Mašek, Jan 03 '20 at 14:17
@DanMašek Indeed this would be easy in C++. But it will take time to translate everything else to it. I still have hope that there is something opposite to numpy.choose in numpy library (generating multiple arrays from single one and index array). Then it would be easy. — begemotv2718, Jan 03 '20 at 14:20
Can you post a sample image and minimal complete script to repro this, so we can focus on the algorithm in question and have something to benchmark? Well, you can always just write a small module in C++ to provide this functionality, or perhaps Cython might help optimize it. — Dan Mašek, Jan 03 '20 at 14:22
Here's my first attempt: https://pastebin.com/bgphPYUv -- i made a test mask with ~450 blobs, and original runs in 2.6 seconds per frame, this one at 0.17 seconds per frame. Results match. I see some potential to speed this approach up a bit further. — Dan Mašek, Jan 03 '20 at 15:16
@begemotv2718 Check out the latest update to my answer. By reusing some of the data that stays the same between iterations, and taking advantage of OpenCV's apparently faster `mean`, i got the `np.bincount` approach beat by at least 50%. — Dan Mašek, Jan 03 '20 at 19:10

Divakar · Accepted Answer · 2020-01-03T20:06:53.383

Seems like a perfect use-case to leverage np.bincount which is a pretty efficient way to compute binned summations and weighted summations with its optional second argument. In our case here, we will use the labels as bins and get the summations as the counts and then frame as the weights as the optional weights arg.

Hence, we will have a vectorized and hopefully more efficient way, like so -

def bincount_method(frame, labels):
    f = frame.reshape(-1,3)
    cn1 = np.bincount(labels.ravel(), f[:,1])
    cn2 = np.bincount(labels.ravel(), f[:,2])
    return (cn2[1:]-cn1[1:])/stats[1:,-1]

Benchmarking

For the timings, we will re-use @Dan Mašek's test-setup image. We are using Python 3.6.8.

import numpy as np
import cv2
import urllib.request as ur

# Setup
url = 'https://i.stack.imgur.com/puxMo.png'
s = ur.urlopen(url)
url_response = ur.urlopen(url)
img_array = np.array(bytearray(url_response.read()), dtype=np.uint8)
img = cv2.imdecode(img_array, -1)
im_mask = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

res,labels,stats,centroids = cv2.connectedComponentsWithStats(im_mask)

np.random.seed(0)
frame = np.random.rand(im_mask.shape[0],im_mask.shape[1],3)

Timings -

# Original soln by OP
In [5]: %timeit [compute_average(frame,i) for i in range(1,len(centroids))]
2.38 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# @Dan Mašek's soln
In [3]: %timeit rg_mean_diff_per_label(frame, labels)
92.5 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Solution with bincount
In [4]: %timeit bincount_method(frame, labels)
30.1 ms ± 82.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Allowing pre-computing on labels

Well the bincount one has a small modification for that :

L = (labels.ravel()[:,None]+np.arange(2)*res).ravel()

def bincount_method_with_precompute(frame, L):
    p = np.bincount(L,frame[...,1:].ravel())
    return (p[res+1:]-p[1:res])/stats[1:,-1]

Comparing against @Dan Mašek's solution with pre-compute on the same setup :

In [4]: %timeit bincount_method_with_precompute(frame, L)
25.1 ms ± 326 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit rg_mean_diff_per_label(frame, label_indices)
20.3 ms ± 432 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

BTW, the first line can be skipped, since the count is already returned in `stats`. ... `c = stats[:,4]` — Dan Mašek, Jan 03 '20 at 16:53
@DanMašek Yup, figured that one and then was adding some timings. Also, got the average done at the final stage, hopefully should have improved timings further. — Divakar, Jan 03 '20 at 17:04
Seems I found a way to beat this with a bit of "cheating" (see updates) :) How does that compare to your version over there? (OpenCV might be using multithreading, i'm not really sure at the moment) — Dan Mašek, Jan 03 '20 at 19:15
Interesting update, now I'll have to try to grok that. Maybe if you could elaborate on what on earth is happening there... right now it seems a bit dense, especially for the not so experienced ones. — Dan Mašek, Jan 03 '20 at 20:14
@DanMašek Well good job on your updates. Pretty interesting optimizations there. On my edits, just the label now extends both of those channnels, so that the bins would cover both. And we do this with offsetting for the second channel labels. More info - https://stackoverflow.com/a/46256361/. — Divakar, Jan 03 '20 at 20:17
Yeah, I broke it down into the simplest steps and went through it in the interpreter and now I get it. What I'm thinking now is that the `frame[...,1:].ravel()` involves a copy. Would it now be maybe faster to do the `bincount` on all 3 channels? Those big allocations tend to be noticeable. — Dan Mašek, Jan 03 '20 at 20:29
Hmm, seems like both approaches take roughly the same time here https://pastebin.com/cNJVj2Up -- interestingly both of those approaches are slower than the non-pre-computed one here. Might be that i'm still on numpy 1.13 here. — Dan Mašek, Jan 03 '20 at 20:44

Dan Mašek · Answer 2 · 2020-01-03T20:10:52.400

The following algorithm comes to mind. This runs in about 0.115 seconds compared to 2.65 seconds of the original. (For comparison @Divakar's solution runs in about 0.058 seconds)

Flatten the labels into a linear array. (Note: Better use ravel() since apparently flatten() always makes a copy)
Perform an indirect sort of the labels to get a list of indices.
Use this to sort the labels.
Find the first occurrence of each label in the sorted list (where two neighbor labels differ)
Then for each label
- Get the range of indices corresponding to this label
- Select the corresponding pixels, calculate mean of columns 1 and 2
- Store the difference

Code:

import numpy as np

def rg_mean_diff_per_label(frame, labels):
    flat_labels = labels.ravel()
    order = flat_labels.argsort()
    sorted_labels = flat_labels[order]
    # Find the position of last occurence of each label
    boundaries = np.where(sorted_labels[:-1] != sorted_labels[1:])[0]
    # And turn it into position of the first occurence each label
    # (except 0, which we want to ignore anyway, as it represents the background)
    boundaries += 1
    # Add position of end of array, so we can simply make ranges using zip(...)
    boundaries = np.append(boundaries, len(flat_labels))

    flat_pixels = frame.reshape(-1, 3) # One pixel per row, 3 columns, 1 per colour channel

    measurements = []
    for start, end in zip(boundaries[:-1], boundaries[1:]):
        indices = order[start:end]
        # NB: We don't care about blue, skip it
        data = flat_pixels[indices,1:]
        means = data.mean(axis=0)
        measurements.append(means[1] - means[0])

    return measurements

Test image:

Colour data was randomized, the values there don't matter for performance evaluation.

Since you use the same mask for all the frames, i.e. labels remain the same, we can avoid a lot of repeated calculations by refactoring this a little, and caching the parts that stay the same.

Let's look at the profile of the function:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    88                                           def rg_mean_diff_per_label(frame, labels):
    89         1        109.0    109.0      0.0      flat_labels = labels.ravel()
    90         1     592417.0 592417.0     35.1      order = flat_labels.argsort()
    91         1     107003.0 107003.0      6.3      sorted_labels = flat_labels[order]
    93         1      38591.0  38591.0      2.3      boundaries = np.where(sorted_labels[:-1] != sorted_labels[1:])[0]
    96         1        364.0    364.0      0.0      boundaries += 1
    98         1        666.0    666.0      0.0      boundaries = np.append(boundaries, len(flat_labels))
   100         1         61.0     61.0      0.0      flat_pixels = frame.reshape(-1, 3) # One pixel per row, 3 columns, 1 per colour channel
   102         1         25.0     25.0      0.0      measurements = []
   103       459      11182.0     24.4      0.7      for start, end in zip(boundaries[:-1], boundaries[1:]):
   104       458      17117.0     37.4      1.0          indices = order[start:end]
   106       458     314348.0    686.3     18.6          data = flat_pixels[indices,1:]
   107       458     579712.0   1265.7     34.4          means = data.mean(axis=0)
   108       458      24001.0     52.4      1.4          measurements.append(means[1] - means[0])
   110         1         21.0     21.0      0.0      return measurements

The only thing that changes per iteration is the pixel data, so if we pre-calculate an array of indices of all pixels belonging to each label, we should be able to cut the per-iteration time down by another 40% or so. (This one runs at around 0.057 seconds per iteration but competition has since improved :) )

Code:

def precalc_label_indices(labels):
    flat_labels = labels.ravel()
    order = flat_labels.argsort()
    sorted_labels = flat_labels[order]
    # Find the position of last occurence of each label
    boundaries = np.where(sorted_labels[:-1] != sorted_labels[1:])[0]
    # And turn it into position of the first occurence each label
    # (except 0, which we want to ignore anyway, as it represents the background)
    boundaries += 1
    # Add position of end of array, so we can simply make ranges using zip(...)
    boundaries = np.append(boundaries, len(flat_labels))

    label_indices = []
    for start, end in zip(boundaries[:-1], boundaries[1:]):
        indices = order[start:end]
        indices = np.sort(indices)) # Access in order can't hurt
        label_indices.append(indices)
    return label_indices

label_indices = precalc_label_indices(labels)

def rg_mean_diff_per_label(frame, label_indices):
    flat_pixels = frame.reshape(-1, 3) # One pixel per row, 3 columns, 1 per colour channel

    measurements = []
    for indices in label_indices:
        # NB: We don't care about blue, skip it
        data = flat_pixels[indices,1:]
        means = data.mean(axis=0)
        measurements.append(means[1] - means[0])

    return measurements

Now, since you already use OpenCV, why not take advantage of it here? Calculation of the mean seems to be the biggest bottleneck now. As it turns out, the OpenCV mean() is much faster here.

def rg_mean_diff_per_label(frame, label_indices):
    flat_pixels = frame.reshape(-1, 3) # One pixel per row, 3 columns, 1 per colour channel

    measurements = []
    for indices in label_indices:
        # NB: We don't care about blue, skip it
        data = flat_pixels[indices,1:]
        means = cv2.mean(data.reshape(-1,1,2)) # This one works per-channel
        measurements.append(means[1] - means[0])

    return measurements

This algorithm results in slightly different output (differences on the order of 1e-14 max). However, it now runs in 0.021 seconds per iteration (excluding the pre-calculation which is insignificant in long-run).

Let's look at the profile of the new function.

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    32                                           def rg_mean_diff_per_label(frame, label_indices):
    33        16       3314.0    207.1      0.1      flat_pixels = frame[...,1:].reshape(-1, 2)
    35        16        464.0     29.0      0.0      measurements = []
    36      7344     151147.0     20.6      2.7      for indices in label_indices:
    38      7328    4574161.0    624.2     81.6          data = flat_pixels[indices]
    39      7328     640719.0     87.4     11.4          means = cv2.mean(data.reshape(-1,1,2))
    40      7328     237797.0     32.5      4.2          measurements.append(means[1] - means[0])
    42        16        413.0     25.8      0.0      return measurements

Now the biggest bottleneck is the extraction of pixels that belong to the given label.

The fact that cv2.mean supports a mask parameter gave me another idea. Let's take advantage of the stats, since they contain the bounding box and pixel count for each label. We can pre-generate ROI-sized mask images for each label. Then for each label, we extraact the ROI (based on the bounding box), and calculate a mean of it with appropriate mask. This time we calculate mean for all three channels, in order to avoid copies of the pixel data.

label_data = []
for label in range(res):
    mask = cv2.inRange(labels, label, label)
    x,y,w,h,n = stats[label]
    roi = mask[y:y+h,x:x+w]
    label_data.append((x,y,x+w,y+h,roi))

def test_v4(frame, label_data):
    measurements = []
    for x1,y1,x2,y2,mask in label_data[1:]:
        roi = frame[y1:y2,x1:x2,:]
        means = cv2.mean(roi, mask)
        measurements.append(means[2] - means[1])
    return measurements

This runs in 0.007 seconds per iteration, again excluding the pre-calculation time.

This is using the single test image I created.

Note: I have an idea for a bad-case (3-pixel wide diagonal stripes) scenario where the masks would be very large for the number of pixels contained. I'll provide an update soon.

@begemotv2718 Note: Since in your case the connected component data remains same across iterations, we could just precalculate the `indices` for each label and then reuse them. That would reduce the cost of the function by another ~40%. — Dan Mašek, Jan 03 '20 at 18:15
@begemotv2718 Thanks :) And I just figured out a different approach that's 3x faster. Give me a moment to write it up. | Basically avoids most of numpy, we pregenerate ROI bounds, and ROI-shaped mask for each label. Then use `cv2.mean` with appropriate array view and matching mask. — Dan Mašek, Jan 03 '20 at 19:44
@begemotv2718 For interest, [here](https://pastebin.com/4rHuLuc7)'s a Cython-based single pass implementatin that runs at around 3.5 milliseconds. (so another 50% improvement, and this is definitely single-threaded, so more could be possibly gained by parallelizing it). | But enough for today, I'll see if I can make a write-up about that tomorrow. — Dan Mašek, Jan 03 '20 at 22:47

Quickly compute average color of all the clusters generated by connectedComponents

2 Answers2

Benchmarking