1

I got a func f(x, y, z) in which the values is either 1 and 0, and I need to get the the first 100 coordinates of the values which equals to 1, to reduction/update them to 0.

This is very simple to realize in c and other languages, However, I've been trying to solve it with Halide for a couple of days. Is there any Function or Algorithm that I can use to solve it in Halide Generators?

1 Answers1

2

The question amounts to "How do I implement stream compaction in Halide?" There is much written on parallel stream compaction and it is somewhat non-trivial to do well. See this Stack Overflow answer on doing it in cuda for some discussion and references: CUDA stream compaction algorithm

An quick implementation of simple stream compaction in Halide using a prefix sum looks like so:

#include "Halide.h"
#include <iostream>

using namespace Halide;

static void print_1d(const Buffer<int32_t> &result) {
    std::cout << "{ ";
    const char *prefix = "";
    for (int i = 0; i < result.dim(0).extent(); i++) {
        std::cout << prefix << result(i);
        prefix = ", ";
    }
    std::cout << "}\n";

}

int main(int argc, char **argv) {
    uint8_t vals[] = {0, 10, 99, 76, 5, 200, 88, 15};
    Buffer<uint8_t> in(vals);

    Var x;
    Func prefix_sum;

    RDom range(1, in.dim(0).extent() - 1);
    prefix_sum(x) = (int32_t)0;
    prefix_sum(range) = select(in(range - 1) > 42, prefix_sum(range - 1) + 1, prefix_sum(range - 1));

    RDom in_range(0, in.dim(0).extent());
    Func compacted_indices;
    compacted_indices(x) = -1;
    compacted_indices(clamp(prefix_sum(in_range), 0, in.dim(0).extent() - 1)) = select(in(in_range) > 42, in_range, - 1);

    Buffer<int32_t> sum = prefix_sum.realize(8);
    Buffer<int32_t> indices = compacted_indices.realize(8);

    print_1d(sum);
    print_1d(indices);

    return 0;
}
Zalman Stern
  • 3,161
  • 12
  • 18
  • Thank you, Zalman. The way of getting coordinates is impressive. However, for multi dimensions, if I want to use this kind of method the input must be a buffer, is it possible to realize a buffer in Generator ? – FrancoisMoreau Oct 14 '17 at 03:49
  • for example, I get a f(x, y, z) in generator, I think it's necessary to reduce it to 1 dimension to prefix_sum it , this requires realizing f(x, y, z) to a buffer, is it possible to realize it and get a buffer in Generator? – FrancoisMoreau Oct 14 '17 at 03:57
  • You should be able to use a multidimensional `RDom` to do a 2D prefix sum. There could be a number of things going wrong. If the reduction `Func` is the output, then the bounds cannot extend further than those of the data. (E.g. it is easy to write a reduction that has the initial value at -1 when the first valid result is at 0.) What error are you seeing? – Zalman Stern Oct 14 '17 at 15:34
  • in fact, I use a 4D func in a generator, and the next stage I need to pick out first 100 values(equals to 1) to make them to 0, for 4D RDom it's not easy to presum, and in generator it's not possible to get a buffer to presum, so that's the problem I'm facing now. – FrancoisMoreau Oct 16 '17 at 03:43
  • Per Andrew's comment on the Halide developers list, you can fold the 4D Func down to a 1D Func and use prefix sum. It is likely possible to limit the amount of computation to the 100 elements using a where clause on an RDom. Note that most use cases want to sample from the entire space of the original Func, not just get the first 100 items traversing in a simple incremental order along the dimensions. This makes things trickier but can often be implemented in the index mapping in going from 4D to 1D. – Zalman Stern Oct 16 '17 at 18:14
  • Thank you, by now I have used a Func::define.extern, which did not show a very good perfomance, the 4D to 1D and them presum is impressvie, I'm going to try them and to compare which one is better in real_time's performance. Many thanks. – FrancoisMoreau Oct 17 '17 at 01:35