3

Is there a performance counter available for code written in the Halide language? I would like to know how many loads, stores, and ALU operations are performed by my code.

The Halide tutorial for scheduling multi-stage pipelines compares different schedules by comparing the amount of allocated memory, loads, stores, and calls to halide Funcs, but I don't see how this information was collected. I suppose it might be possible to use trace_stores, trace_loads, and trace_realizations to print to the console every time one of these operations occurs. This isn't a great option though because it would greatly slow down the program's execution and would require some kind of counting script to compile the long list of console outputs into the desired counts for loads, stores, and ALU operations.

Kantthpel
  • 149
  • 10

2 Answers2

2

I'm pretty sure they just used the trace_xxx output and ran some scripts/programs on it.

If you're looking for real performance numbers on a X86 platform, I would go with Intel VTune Amplifier. It's pretty expensive, but may be free if you're in academia (student, teacher, researcher) or it's for an open source project.

Other than that, look at the lowered statement code by setting HL_DEBUG_CODEGEN=1 in the environment and you can get a better idea of the loop structure and data use. Note that this output goes to stderr, not stdout.

EDIT: For Linux, there's perf.

Khouri Giordano
  • 796
  • 3
  • 9
2

We do not have any perf counter based support at present. It is fairly difficult to make it portable. (And on mobile devices, often the OS simply doesn't allow access to the hardware.) The support in Profiling.cpp and src/profiling.cpp could likely be used to drive perf counter operation. The profiling lowering pass adds code to call routines in the runtime which update information about Func and Pipeline execution. This information is collected and aggregated by another thread.

If tracing is run to a file (e.g. using HL_TRACE_FILE) a binary format is used and it is a bit more efficient. See utils/HalideTraceViz for a tool to work with the binary format. This is generally how analyses are done within the team.

There was a small amount of investigation of OProfile, which looked promising but I don't think we ever got code working.

Zalman Stern
  • 3,161
  • 12
  • 18