I am looking for an example of how to use OpenCV's connectedComponentsWithStats()
function in Python. Note this is only available with OpenCV 3 or newer. The official documentation only shows the API for C++, even though the function exists when compiled for Python. I could not find it anywhere online.

- 1,743
- 2
- 13
- 36

- 2,765
- 2
- 13
- 14
-
For insights on using the labels to mask the image etc, see [Python OpenCV \- Connected Component Labeling and Analysis \- GeeksforGeeks](https://www.geeksforgeeks.org/python-opencv-connected-component-labeling-and-analysis/) – nealmcb Mar 12 '23 at 00:39
4 Answers
The function works as follows:
# Import the cv2 library
import cv2
# Read the image you want connected components of
src = cv2.imread('/directorypath/image.bmp')
# Threshold it so it becomes binary
ret, thresh = cv2.threshold(src,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# You need to choose 4 or 8 for connectivity type
connectivity = 4
# Perform the operation
output = cv2.connectedComponentsWithStats(thresh, connectivity, cv2.CV_32S)
# Get the results
# The first cell is the number of labels
num_labels = output[0]
# The second cell is the label matrix
labels = output[1]
# The third cell is the stat matrix
stats = output[2]
# The fourth cell is the centroid matrix
centroids = output[3]
Labels is a matrix the size of the input image where each element has a value equal to its label.
Stats is a matrix of the stats that the function calculates. It has a length equal to the number of labels and a width equal to the number of stats. It can be used with the OpenCV documentation for it:
Statistics output for each label, including the background label, see below for available statistics. Statistics are accessed via stats[label, COLUMN] where available columns are defined below.
- cv2.CC_STAT_LEFT The leftmost (x) coordinate which is the inclusive start of the bounding box in the horizontal direction.
- cv2.CC_STAT_TOP The topmost (y) coordinate which is the inclusive start of the bounding box in the vertical direction.
- cv2.CC_STAT_WIDTH The horizontal size of the bounding box
- cv2.CC_STAT_HEIGHT The vertical size of the bounding box
- cv2.CC_STAT_AREA The total area (in pixels) of the connected component
Centroids is a matrix with the x and y locations of each centroid. The row in this matrix corresponds to the label number.

- 2,765
- 2
- 13
- 14
-
I must say that for some reason, I had to use cv2.THRESH_BINARY instead of cv2.THRESH_BINARY+cv2.THRESH_OTSU, then I had to cast src to integer and thresh to float in order for it to work. I don't know why, but it didn't work otherwise. – Бојан Матовски Jun 27 '16 at 07:54
-
I don't understand why you create the labels matrix when it is then part of the output anyway? – ypnos Jul 01 '16 at 14:14
-
1@ypnos You don't need to for connected components with stats, but do for connected components without stats. I think that part was just left over from me doing it the other way. I fixed it now. Cheers! – Zack Knopp Jul 04 '16 at 17:13
-
Thanks so much for this! This is a much better description of how this works than the C++ docs have. – Haldean Brown Oct 11 '16 at 16:34
-
2can some one explain how to use the labels? How to check if a centroid is what label? – recurf Dec 07 '16 at 22:28
-
3Each component in the image gets a number (label). The background is label 0, and the additional objects are numbered from 1 to `num_labels-1`. The centroids are indexed by the same numbers as the labels. `centroids[0]` isn't particularly useful--it's just the background. `centroids[1:num_labels]` is what you want. – krs013 Feb 25 '17 at 21:43
-
@ZackKnopp Do you also know how I can order the labels by area, width or height? – matchifang Jul 24 '17 at 19:05
-
@ZackKnopp That's incorrect, you can use the function without stats like this as well: `_, labels = cv2.connectedComponents(segmentation)` :) – smcs Sep 01 '17 at 09:02
-
3@matchifang You could create an array with the component areas: `areas=output[2][:,4]` Then an array with the numbers of components: `nr=np.arange(output[0])` Then sort them according to area size: `ranked=sorted(zip(areas,nr))` With help from here: https://stackoverflow.com/questions/6618515/sorting-list-based-on-values-from-another-list – smcs Sep 01 '17 at 12:43
-
`cv2.connectedComponentsWithStats` does not take connectivity as an input argument in OpenCV 3 or 4, and I don't think the function was present in 2. Is this simply a mixup between `conectedComponentsWithStats` and `connectedComponentsWithStatsWithAlgorithm`? `output = cv2.connectedComponentsWithStats(thresh)` gives the exact same result for me. – Atnas Feb 24 '21 at 19:44
-
Docs say ltype can be CV_32S or CV_16U - what do these do? I can't find ay documentation on their impact – QuentinJS Dec 28 '22 at 21:38
-
Could someone please explain what I could use all of this for? I am trying to extract individual characters/text and landed here. I played around with the code above, it works, but how do I utilize it? I.e. How do I utilize centroids to find the centroids of the text? – CRich Mar 16 '23 at 23:02
I have come here a few times to remember how it works and each time I have to reduce the above code to :
_, thresh = cv2.threshold(src,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
connectivity = 4 # You need to choose 4 or 8 for connectivity type
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(thresh , connectivity , cv2.CV_32S)
Hopefully, it's useful for everyone :)

- 1,364
- 15
- 16
Adding to Zack Knopp
answer,
If you are using a grayscale image you can simply use:
import cv2
import numpy as np
src = cv2.imread("path\\to\\image.png", 0)
binary_map = (src > 0).astype(np.uint8)
connectivity = 4 # or whatever you prefer
output = cv2.connectedComponentsWithStats(binary_map, connectivity, cv2.CV_32S)
When I tried using Zack Knopp
answer on a grayscale image it didn't work and this was my solution.

- 111
- 2
- 9
the input image needs to be single channel. so first convert to grayscale, otherwise it causes error in opencv 4.x you need to convert to grayscale and then the Zack's answer.
src = cv.cvtColor(src, cv.COLOR_BGR2GRAY)

- 11
- 1
- 3