Here is one possible approach:
Create an additional centroid index vector:
DistancesValues = {10, 15, 20, 12, 10, 5, 17, 22, 8, 7}
DatapointsIndex = {1, 2, 3, 4, 5, 1, 2, 3, 4, 5}
CentroidIndex = {1, 1, 1, 1, 1, 2, 2, 2, 2, 2}
Now do a sort_by_key, using DatapointsIndex
as the keys, and the other two vectors zipped together as the values. This has the effect rearranging all 3 vectors so that the DatapointsIndex
has like indices grouped together:
DatapointsIndex = {1, 1, 2, 2, 3, 3, 4, 4, 5, 5}
(and the other 2 vectors are correspondingly rearranged).
Now do a reduce_by_key. If we choose the thrust::minimum
operator, we get a reduction that effectively chooses the minimum value in a group (rather than summing the values in a group). reduce_by_key means that a reduction of this type is done on each consecutive group of like keys. So we will once again use the DatapointsIndex
as our key vector, and the other two vectors zipped together as our value vector. Most of the output of reduce_by_key we don't care about, except for the results vector that emanates from the CentroidIndex
vector. By counting centroid indices in this results vector, we can get the desired output.
Here is a fully worked example:
$ cat t428.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include <thrust/reduce.h>
#include <thrust/copy.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/iterator/discard_iterator.h>
#include <stdio.h>
#define NUM_POINTS 5
#define NUM_CENTROID 2
#define DSIZE (NUM_POINTS*NUM_CENTROID)
int main(){
int DistancesValues[DSIZE] = {10, 15, 20, 12, 10, 5, 17, 22, 8, 7};
int DatapointsIndex[DSIZE] = {1, 2, 3, 4, 5, 1, 2, 3, 4, 5};
int CentroidIndex[DSIZE] = {1, 1, 1, 1, 1, 2, 2, 2, 2, 2};
thrust::device_vector<int> DV(DistancesValues, DistancesValues + DSIZE);
thrust::device_vector<int> DI(DatapointsIndex, DatapointsIndex + DSIZE);
thrust::device_vector<int> CI(CentroidIndex, CentroidIndex + DSIZE);
thrust::device_vector<int> Ra(NUM_POINTS);
thrust::device_vector<int> Rb(NUM_POINTS);
thrust::sort_by_key(DI.begin(), DI.end(), thrust::make_zip_iterator(thrust::make_tuple(DV.begin(), CI.begin())));
thrust::reduce_by_key(DI.begin(), DI.end(), thrust::make_zip_iterator(thrust::make_tuple(DV.begin(), CI.begin())), thrust::make_discard_iterator(), thrust::make_zip_iterator(thrust::make_tuple(Ra.begin(), Rb.begin())), thrust::equal_to<int>(), thrust::minimum<thrust::tuple<int, int> >());
printf("CountOfCentroid 1 = %d\n", thrust::count(Rb.begin(), Rb.end(), 1));
printf("CountOfCentroid 2 = %d\n", thrust::count(Rb.begin(), Rb.end(), 2));
return 0;
}
$ nvcc -arch=sm_20 -o t428 t428.cu
$ ./t428
CountOfCentroid 1 = 2
CountOfCentroid 2 = 3
$
As Eric points out in his answer here (your question is almost a duplicate of that one), the sort_by_key
is probably unnecessary. The reordering of this data follows a regular pattern, so we don't need to harness the complexity of sort, and can therefore re-order the data with clever use of iterators. Under those circumstances, it may be possible to do the entire operation (approximately) with a single call to reduce_by_key
.