How to find the centroids of pixel clusters within a binary image

Question

I've written some code in MATLAB that converts an image (of stars) into a binary image using a set threshold and then labels each cluster of pixels (stars) that is above this threshold. The labelling produces an output: e.g.

[1 1 1 0 0 0 0 0 0
 1 1 0 0 0 2 2 2 0
 0 0 0 3 3 0 2 0 0
 0 0 0 3 3 0 0 0 0]

So each cluster of 1's, 2's, 3's etc. represents a star. I used the answer provided at this link: How to find all connected components in a binary image in Matlab? to label the pixels. I also can't use the image processing toolbox. The code that I have so far is shown below.

How do I now find the centroids of each pixel cluster in the image?

clc
clear all
close all
img=imread('star.jpg');
binary_image=convert2binary(img);
imshow(binary_image);
visited = false(size(binary_image));
[rows, cols] = size(binary_image);
B = zeros(rows, cols);
ID_counter = 1;
for row = 1:rows
    for col = 1:cols
        if binary_image(row, col) == 0
            visited(row, col) = true;
        elseif visited(row, col)
            continue;
        else
            stack = [row col];

            while ~isempty(stack)
                loc = stack(1,:);
                stack(1,:) = [];

                if visited(loc(1),loc(2))
                    continue;
                end

                visited(loc(1),loc(2)) = true;
                B(loc(1),loc(2)) = ID_counter;

                [locs_y, locs_x] = meshgrid(loc(2)-1:loc(2)+1, loc(1)-1:loc(1)+1);
                locs_y = locs_y(:);
                locs_x = locs_x(:);

                out_of_bounds = locs_x < 1 | locs_x > rows | locs_y < 1 | locs_y > cols;
                locs_y(out_of_bounds) = [];
                locs_x(out_of_bounds) = [];
                is_visited = visited(sub2ind([rows cols], locs_x, locs_y));

                locs_y(is_visited) = [];
                locs_x(is_visited) = [];

                is_1 = binary_image(sub2ind([rows cols], locs_x, locs_y));
                locs_y(~is_1) = [];
                locs_x(~is_1) = [];

                stack = [stack; [locs_x locs_y]];
            end

            ID_counter = ID_counter + 1;
        end
    end
end

function [binary] = convert2binary(img)

    [x, y, z]=size(img);
    if z==3
        img=rgb2gray(img);
    end

    img=double(img);
    sum=0;
    for i=1:x
        for j=1:y
            sum=sum+img(i, j);
        end
    end


    threshold=100  % or sum/(x*y);
    binary=zeros(x,y);

    for i=1:x
        for j=1:y
            if img(i, j) >= threshold
                binary(i, j) = 1;
            else
                binary(i, j)=0;
            end
        end
    end
end

score 1 · Accepted Answer · answered May 06 '19 at 15:10

The centroid is the first order moment. It is computed by

sum(x*v)/sum(v) , sum(y*v)/sum(v)

For a binary image you can do this (I'm using a trivial loop, not vectorized code, so we can extend it later easily):

img = [1 1 1 0 0 0 0 0 0
       1 1 0 0 0 2 2 2 0
       0 0 0 3 3 0 2 0 0
       0 0 0 3 3 0 0 0 0]; % Op's example data

bin = img==1;              % A binary image

% Algorithm
sum_v = 0;
sum_iv = 0;
sum_jv = 0;
for jj=1:size(bin,2)
   for ii=1:size(bin,1)
      sum_v = sum_v + bin(ii,jj);
      sum_iv = sum_iv + ii * bin(ii,jj);
      sum_jv = sum_jv + jj * bin(ii,jj);
   end
end
centroid = [sum_iv, sum_jv] / sum_v;

You could of course iterate over each of the labels of the labeled image img, and apply the above code. But that is highly inefficient. Instead, we can loop through the image once and compute all centroids at the same time. We convert sum_v etc. into vectors, containing one running sum per label:

N = max(img(:));     % number of labels
sum_v = zeros(N,1);
sum_iv = zeros(N,1);
sum_jv = zeros(N,1);
for jj=1:size(img,2)
   for ii=1:size(img,1)
      index = img(ii,jj);
      if index>0
         sum_v(index) = sum_v(index) + 1;
         sum_iv(index) = sum_iv(index) + ii;
         sum_jv(index) = sum_jv(index) + jj;
      end
   end
end
centroids = [sum_iv, sum_jv] ./ sum_v;

Thanks for your help, works great! However I've noticed that when I run it, it now lists the centroids in the command window in an array with the y-coordinates in the left column and then the x-coordinates in the right column. I don't understand why it's doing this? It should list them like [x y]? I can swap them around by changing the last line to [sum_jv, sum_iv] ./sum_v but I'm not sure why that works. — , May 07 '19 at 12:00
@KarlDilkington: In MATLAB, the first index `i` is the row number (y coordinate) and the second one `j` the column number (x coordinate). Also, indexing starts at 1. I’ve followed that convention. You can swap `i` and `j` in the above if you want to. — Cris Luengo, May 07 '19 at 13:00

How to find the centroids of pixel clusters within a binary image

1 Answers1