Is there a performance counter available for code written in the Halide language? I would like to know how many loads, stores, and ALU operations are performed by my code.
The Halide tutorial for scheduling multi-stage pipelines compares different schedules by comparing the amount of allocated memory, loads, stores, and calls to halide Funcs, but I don't see how this information was collected. I suppose it might be possible to use trace_stores, trace_loads, and trace_realizations to print to the console every time one of these operations occurs. This isn't a great option though because it would greatly slow down the program's execution and would require some kind of counting script to compile the long list of console outputs into the desired counts for loads, stores, and ALU operations.