3

Scenario:

I have an array values with some values. These elements are divided into groups. The grouping information is stored into another array, said groups. This array has the same dimension of values. The i-th element of groups tells at which group belongs the i-th element of values.

I need to compute the partial sum ov values for each group.

Example:

For instance, in the case I have 3 groups:

values = [0.1, 0.5, 10e-5, 12., 10e3, 0.2, 15., 10e-6]
groups = [1  , 1  , 2    , 3  , 3   , 1  , 3  , 2    ]

I'd like to compute this array

sums = [0.1+0.5+0.2, 10e-5+10e-6, 12.+10e3+15.]

The problem:

Since I have a huge number of elements (some millions), and some thousand groups, I use numpy arrays to tread these objects. I'm actually searching for something with complexity O(N), with no loops over the number of groups. Also, any performance improvement is welcome.

Currently, my code looks like this:

for group_id in xrange(0,number_of_groups):
    sums[i]=numpy.sum(values[groups==i])

This code makes number_of_groups*number_of_elements operations.

This code is too slow, I'm searching for something which does not loop over the groups.

Possible C solution:

It is possible to do this work in C. The code in C could be:

for(i=0;i<n_elements;i++)
    sums[groups[i]]+=values[i]

As you can see this code has only number_of_elements iterations. There are no iterations in the array of groups.

So..

How could I write the same in numpy?

Community
  • 1
  • 1
Antonio Ragagnin
  • 2,278
  • 4
  • 24
  • 39

0 Answers0