3

The select expression in Halide is similar to if or switch statement in C. That is to say, the expression select(cond, a, b) will evaluate to a if cond is true, otherwise it evaluates to b.

So, I thought select evaluates only one branch between a and b, just like if does. But it turns out not true, I found select nearly always evaluates both branches. This is annoying especially when a or b is very costly. Maybe I'm not using Halide in the right way?

This problem was asked a long time ago in here, however the answer in that post cannot solve my problem. Here is some actual code:

Var x("x"), y("y");
Func f("f"), g("g");
f(x, y) = x >= 10000;           // a1. f is calculated
// f(x, y) = cast<bool>(0);     // a2. f is constant
Expr power = pow(pow(x, x), y);
g(x, y) = select(f(x, y), power, 0.f);
f.compute_root();               // b1. compute f in advance
// f.compute_inline();          // b2. compute f inline

Buffer<float> result(1000, 1000);
g.realize(result);

double time = Tools::benchmark(3, 100, [&] {
    g.realize(result);
});
cout << "time: " << time * 1e3 << " ms" << endl;

In this example, select evaluates to power(a costly function) or 0.f depending on f(x, y). And f(x, y) always evaluates to false, which means select should always evaluates to 0.f. There are some different test choices:

  • a1: The value of f is calculated, and the result will always be false
  • a2: f is set to the constant value false
  • b1: Compute f in advance
  • b2: Compute f inline

The test result (time cost) is:

  • a1+b1: 4.22 ms
  • a1+b2: 4.15 ms
  • a2+b1: 4.15 ms
  • a2+b2: 0.0788 ms

It seems clearly that power is evaluated in the first three cases, and it should not happen.

An other example is my FAST corner detector. If the threshold is set to 256 (which can never be reached), is_corner will always be false. However, select in L50 always evaluates score_val (very time-consuming) unless vectorize in L64 is disabled.

VictorBian
  • 675
  • 1
  • 4
  • 17
  • 1
    I just found the description of `select` in Halide.h: "Returns an expression similar to the ternary operator in C, except that **it always evaluates all arguments**." So, is there any alternative solution? – VictorBian Mar 25 '21 at 07:10
  • I would be also interested in the select optimization you described. In CPU vectorized code (SSE2, AVX2, ..., without Halide), you can use [horizontal_or and horizontal_and instructions](https://github.com/vectorclass/version2/blob/7f12a5899bcd3a8ae8e3f7814544e6661621da69/vectori256.h#L317) so that if all values of the vector meet some condition, you don't have to perform a costly branch for that vector at all. Unfortunately, I haven't found anything like that in Halide yet. Maybe it can't be easily done for all Halide targets. – Michal Fapso Apr 28 '21 at 15:50

0 Answers0