3

I am new to Halide and have written a simple code to compute max(127, pix(x,y)) for every pixel in an image. Though the code runs fine on CPU, it gives me wrong outputs when I set Target::CUDA. I'm not able to find the issue. The following is a part of my code. Let me know if there is a mistake in the code, or do I have to re-build Halide with a support that will enable CUDA.

Halide::Var x, y;
Halide::Buffer<uint8_t> inputImageBuf(inpImg, imgSizes);

Halide::Func reluOp("ReLU Operation");
reluOp(x,y) = Halide::max(127, inputImageBuf(x, y));

int numTiles = 4;
Halide::Var threads_x, threads_y, blocks_x, blocks_y;

Halide::Target targetCUDA = Halide::get_host_target();
targetCUDA.set_feature(Halide::Target::CUDA);
targetCUDA.set_feature(Halide::Target::Debug);
reluOp.gpu_tile(x, y, blocks_x, blocks_y, threads_x, threads_y, numTiles, numTiles, Halide::TailStrategy::Auto, Halide::DeviceAPI::CUDA);

// reluOp.compile_jit(targetCUDA);  
reluOp.print_loop_nest();
Halide::Buffer<uint8_t> result = reluOp.realize(cols, rows, targetCUDA);

result.copy_to_host();

1 Answers1

3

One thing to try is adding a inputImageBuf.set_host_dirty(). If that helps I would consider that a bug in Halide.

You can also scroll through the debug output and see if the expected number of copies to and from the host are happening.

Andrew Adams
  • 1,396
  • 7
  • 3
  • 3
    Thank Andrew. Setting the host 'dirty' worked. I would like to to know why setting this made it work.? Could you just briefly tell me why.? – Gautam Krishna Jun 14 '18 at 03:50