0

I have an algorithm in which I repeatedly do a reduce_by_key and get a vector of reduced values out, one per keyed segment. I keep track of the best output value for each segment over successive iterations. It looks like this:

thrust::device_vector<int> keys(N), values(N), outKeys(N), outValues(N), outValuesBest(N);

// Set up the keys
// Initialize outValuesBest

while (1) {
    // Get some values (keys stay the same across iterations)

    // Do the reduce
    auto outEnd = thrust::reduce_by_key(thrust::device,
        keys.begin(), keys.end(), values.begin(),
        outKeys.begin(), outValues.begin(),
        [] __device__(int ka, int kb) { return ka == kb; },
        [] __device__(int a, int b) { return min(a, b); });

    size_t nSegments = outEnd.first - outKeys.begin();

    auto outValuesp = outValues.begin();
    auto outValuesBestp = outValuesBest.begin();

    // Update the per-segment vector of best results
    thrust::for_each_n(thrust::device,
        thrust::counting_iterator<size_t>(0), nSegments, [=] __device__(size_t i) {
        if (outValuesp[i] < outValuesBestp[i]) {
            outValuesBestp[i] = outValuesp[i];
        }
    });
}

As you can see, my current implementation uses a separate algorithm to compare each element of the reduced output and update the corresponding element in the best vector for each segment that improved. I would like to get rid of the second operation and the second copy of the array (and its associated bandwidth). This is a job for kernel fusion using fancy iterators in Thrust. Does anyone know of a way to fuse these two kernels?

I'm envisioning a new kind of fancy iterator to handle this. Call it a conditional_discard_iterator. It would allow me to say the following instead. The idea is that it only actually writes the output element to the underlying iterator if the predicate returns true and discards it if false.

thrust::device_vector<int> keys(N), values(N), outKeys(N), outValuesBest(N);

// Set up the keys...
// Initialize outValuesBest...

while (1) {
    // Get some values...

    auto OutIt = make_conditional_discard_iterator(outValuesBest.begin(),
      [] __device__(int newValue, int fromOutValuesBest)
        { return newValue < fromOutValuesBest; });

    auto outEnd = thrust::reduce_by_key(thrust::device,
        keys.begin(), keys.end(), values.begin(),
        outKeys.begin(), OutIt,
        [] __device__(int ka, int kb) { return ka == kb; },
        [] __device__(int a, int b) { return min(a, b); });

    size_t nSegments = outEnd.first - outKeys.begin();
}

The inputs to the predicate could be a bit more general but otherwise this seems like a pretty elegant solution. I'm just hoping something like it already exists. If not in Thrust, is there in Boost or STL?

To some it is less than obvious that this question is not a duplicate of another, simply because it involves the subject of fancy output iterators.

All The Rage
  • 743
  • 5
  • 24
  • 2
    the `transform_output_iterator` described [here](https://stackoverflow.com/questions/34370201/combining-two-lists-by-key-using-thrust/34424403#34424403) may be of interest. The answer there also links to something that may be similar in boost. – Robert Crovella Dec 07 '18 at 09:11
  • @Robert Crovella, that's a very useful comment in pointing out that my question can be answered using a custom output iterator (though not the same one as in that question.) – All The Rage Dec 17 '18 at 16:09
  • @talonmies, this question is not a duplicate. The question you link to does not have the same goal, is not a reduction, and none of those answers will answer this question. You should not have marked it as a duplicate. – All The Rage Dec 17 '18 at 16:10

0 Answers0