0

Using VexCL in C++ I am trying to count all values in a vector above a certain minimum and I would like to perform this count on the device. The default Reductors only provide methods for MIN, MAX and SUM and the examples do not show very clear how to perform such a operation. This code is slow as it is probably executed on the host instead of the device:

int amount = 0;
int minimum = 5;

for (vex::vector<int>::iterator i = vector.begin(); i != vector.end(); ++i)
{
    if (*i >= minimum)
    {
        amount++;
    }
}

The vector I am using will consists of a large amount of values, say millions and mostly zero's. Besides the amount of values that are above the minimum, I also would like to retrieve a list of vector-ID's which contains these values. Is this possible?

KindDragon
  • 6,558
  • 4
  • 47
  • 75
Neman
  • 1,237
  • 2
  • 13
  • 16
  • I don't know about VexCL, but on the host side you should probably have used [`count_if`](http://en.cppreference.com/w/cpp/algorithm/count) instead of looping yourself. VexCL doesn't have anything similar? – Some programmer dude Sep 05 '14 at 11:06

2 Answers2

1

If you only needed to count elements above the minimum, this would be as simple as

vex::Reductor<int, vex::SUM> sum(ctx);
int amount = sum( vec >= minimum );

The vec >= minimum expression results in a sequence of ones and zeros, and sum then counts ones.

Now, since you also need to get the positions of the elements above the minimum, it gets a bit more complicated:

#include <iostream>
#include <vexcl/vexcl.hpp>

int main() {
    vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));

    // Input vector
    vex::vector<int> vec(ctx, {1, 3, 5, 2, 6, 8, 0, 2, 4, 7});
    int n = vec.size();
    int minimum = 5;

    // Put result of (vec >= minimum) into key, and element indices into pos:
    vex::vector<int> key(ctx, n);
    vex::vector<int> pos(ctx, n);

    key = (vec >= minimum);
    pos = vex::element_index();

    // Get number of interesting elements in vec.
    vex::Reductor<int, vex::SUM> sum(ctx);
    int amount = sum(key);

    // Sort pos by key in descending order.
    vex::sort_by_key(key, pos, vex::greater<int>());

    // First 'amount' of elements in pos now hold indices of interesting
    // elements. Lets use slicer to extract them:
    vex::vector<int> indices(ctx, amount);

    vex::slicer<1> slice(vex::extents[n]);
    indices = slice[vex::range(0, amount)](pos);

    std::cout << "indices: " << indices << std::endl;
}

This gives the following output:

indices: {
    0:      2      4      5      9
}
ddemidov
  • 1,731
  • 13
  • 15
0

@ddemidov

Thanks for your help, it is working. However, it is much slower than my original code which copies the device vector to the host and sorts using Boost. Below is the sample code with some timings:

#include <iostream>
#include <cstdio>
#include <vexcl/vexcl.hpp>
#include <vector>
#include <boost/range/algorithm.hpp>

int main()
{
    clock_t start, end;

    // initialize vector with random numbers
    std::vector<int> hostVector(1000000);
    for (int i = 0; i < hostVector.size(); ++i)
    {
        hostVector[i] = rand() % 20 + 1;
    }

    // copy to device
    vex::Context cpu(vex::Filter::Type(CL_DEVICE_TYPE_CPU) && vex::Filter::Any);
    vex::Context gpu(vex::Filter::Type(CL_DEVICE_TYPE_GPU) && vex::Filter::Any);
    vex::vector<int> vectorCPU(cpu, 1000000);
    vex::vector<int> vectorGPU(gpu, 1000000);
    copy(hostVector, vectorCPU);
    copy(hostVector, vectorGPU);

    // sort results on CPU
    start = clock();
    boost::sort(hostVector);
    end = clock();
    cout << "C++: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;

    // sort results on OpenCL
    start = clock();
    vex::sort(vectorCPU, vex::greater<int>());
    end = clock();
    cout << "vexcl CPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;

    start = clock();
    vex::sort(vectorGPU, vex::greater<int>());
    end = clock();
    cout << "vexcl GPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;

    return 0;
}

which results in:

C++: 17 ms
vexcl CPU: 737 ms
vexcl GPU: 1670 ms

using an i7 3770 CPU and a (slow) HD4650 graphics card. As I'v read OpenCL should be able to perform fast sortings on large vertices. Do you have any advice how to perform a fast sort using OpenCL and vexcl?

Neman
  • 1,237
  • 2
  • 13
  • 16
  • You should not measure the first call to `vex::sort()` (on both devices), since it includes the OpenCL compilation overhead. Here is the link to your modified source: https://gist.github.com/ddemidov/cf141c97aa22de32c22d, which results in the following output: CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz (Intel(R) OpenCL) GPU: Tesla K40c (NVIDIA CUDA) C++: 34 ms vexcl CPU: 281 ms vexcl GPU: 1 ms – ddemidov Sep 08 '14 at 12:23