How to get the number of same values in a numpy array efficiently?

Question

I have a 3D medical image. I use scipy.ndimage.measurements.label to get the connected voxel groups. It is very fast. But I also want to get the number of voxels of each label. I use the following code to get the number of each value (suppose image_3d is the array after scipy.ndimage.measurements.label). It cost about 2 minutes.

import numpy as np
from skimage.measure import label
import time

image_3d = np.random.randint(100, size=(512, 512, 1024))
t1 = time.time()
pixel_count_list = [np.sum((image_3d== i).astype(int)) for i in range(100)]
t2 = time.time()
print("used time: ", t2-t1)
# 117 seconds

Is there any efficient way to get it efficiently?

@yatu Excellent!!! I did not know `return_counts` parameter before! — Jingnan Jia, Oct 10 '20 at 18:14
I'm glad you found a solution to your problem. However, an actual answer/solution should **not** be edited into your question. In general, you should [edit] the question to *clarify* it, but not to include an answer within it. You should create your own answer with the code/solution you used to solve your problem, and then accept it (the system may require a 48 hour delay prior to doing so). When you've solved the problem yourself, [answering your own question is encouraged](/help/self-answer). — double-beep, Oct 10 '20 at 18:20
Bincount is O(n) while unique sorts, and is therefore O(n log n) — Mad Physicist, Oct 11 '20 at 06:07
Research before asking is also encouraged. Also, you have a bunch of superfluous imports. — Mad Physicist, Oct 11 '20 at 06:08

score 0 · Answer 1 · answered Oct 10 '20 at 18:35

import numpy as np
from skimage.measure import label
import time

image_3d = np.random.randint(100, size=(512, 512, 1024))
t1 = time.time()

unique_label, count_list = np.unique(image_3d, return_counts=True)
t2 = time.time()
print("used time: ", t2-t1)
# 2 seconds

How to get the number of same values in a numpy array efficiently?

1 Answers1