I've discover Halide (the language), a few weeks ago and I actually enjoy trying to optimize some parts of my code with it, nonetheless, I struggle to find an optimized implementation of a very basic image processing task: Normalization
Basically, If I
is my grayscale Image, I just want:
I_norm = (I - min(I)) / (max(I) - min(I))
I've managed to come up with this code (Python API of halide but hopefully in C++ it is similar)
def normalize(input: hl.Buffer, height: int, width: int):
low, high, norm_output = hl.Func('low'), hl.Func('high'), hl.Func('norm_output')
x, y = hl.Var('x'), hl.Var('y')
dom = hl.RDom([(0, width), (0, height)])
low[hl._0] = hl.minimum(input[dom.x, dom.y])
high[hl._0] = hl.maximum(input[dom.x, dom.y])
norm_output[x, y] = (input[x, y] - low[0]) / hl.f32(high[0] - low[0])
low.compute_root()
high.compute_root()
norm_output.compute_root().parallel(y).vectorize(x, 8)
return norm_output
This piece of code works quite well (and it is the fastest I could come up with...), but as soon as I use it in a pyramid, let's say I'm doing this:
def get_structure(pyr: List, h: int, w: int, name: str) -> List:
structure = [hl.Func('%s_%i' % (name, i)) for i in range(len(pyr))]
norm_structure = [hl.Func('norm_%s_%i' % (name, i)) for i in range(len(pyr))]
for lv, layer in enumerate(pyr):
structure[lv][x, y] = some_function(layer)[x, y] # return un-normalized "matrix"
# apply my normalization function
for lv, layer in enumerate(pyr):
norm_structure[lv] = normalize(structure[lv], h, w)
return norm_structure
Then everything becomes so slow....
Indeed, if I comment the line:
for lv, layer in enumerate(pyr):
norm_structure[lv] = normalize(structure[lv], h, w)
and return structure
instead. My overall pipeline run in under 40ms...
As soon as I put the normalization, it sky-rocket to **0.
So the question is, how can we compute efficiently a normalization in Halide? Like we can do lot's of very complex stuff very efficiently but a simple normalization on the whole domain... ?
Note: I've also added scheduling, for example:
for lv in range(len(pyr)):
norm_structure[lv].compute_root().parallel(y, 4).vectorize(x, 4)
in get_structure()
, but obviously it doesn't improve anything
Also, I'm not satisfied with my code, in the sense that In the best halide code that I've found I'm looping twice to get the min and then the max and finally compute the normalization,
will If I was doing that by myself I would maintain 2 variables for min
and max
in one loop
Note also that I've spend a lot of time to find how to optimize my code, be it through the official halide apps on Github or elsewhere but I didn't find anything to help build that simple function efficiently...
So, Thank you, in advance for the help!