0

I am trying to accomplish the following, but without the looping/list comprehension

        count = np.array([np.sum(X[Z==k], axis=0) for k in range(num_clusters)])

X is a two-dimensional array, Z is one-dimensional with values from 0 to k.

Thanks in advance for any help!

Example data:

num_clusters = 5
num_datapoints = 100

X = np.random.choice(20, [num_datapoints, 15]) 
# Example data matrix of shape (num_datapoints, 15) where each of the 15 "features" can have a value from 0 to 20.
Z = np.random.choice(num_clusters, num_datapoints) # random cluster assignments for 100 datapoints

count = np.array([np.sum(X[Z==k], axis=0) for k in range(num_clusters)])

Result:

array([[171, 178, 148, 136, 100, 108, 125, 158, 135, 118, 133, 149, 143, 112, 198],
       [226, 181, 199, 220, 186, 193, 217, 230, 234, 194, 170, 227, 241, 245, 161],
       [160, 178, 171, 126, 156, 152, 148, 164, 134, 128, 224, 173, 213, 166, 178],
       [162, 161, 229, 216, 182, 217, 229, 168, 245, 155, 188, 187, 210, 219, 158],
       [188, 233, 244, 222, 245, 220, 232, 307, 265, 232, 239, 189, 253, 259, 212]])
PyJulian
  • 1
  • 2
  • 1
    Sample data? So is this just a groupby / sum? – Dan Nov 08 '19 at 10:41
  • Can you add some input sample data and show what the output should look like? – Sander van den Oord Nov 08 '19 at 10:44
  • Have edited the quesion. The groupy/sum pointer was helpful, but https://stackoverflow.com/questions/4373631/sum-array-by-number-in-numpy seems to indicate that the loop is not avoidable without loss of performance. – PyJulian Nov 08 '19 at 11:16

0 Answers0