I want to get a sampling profile of my program that includes blocked time (waiting for a network service) as well as CPU time.
perf's default profiling mode (perf record -F 99 -g -- ./binary
) samples whole-system running time, but doesn't give a clear indication about how much time my program spends in what parts of my program: it's skewed toward CPU-intensive parts and doesn't show IO-intensive parts at all. The sleep time profiling mode (related on SO) shows sleep times but no general profile.
What I'd like is something really simple: record a call stack of my program every 10ms, no matter whether it's running or currently blocked. Then make a flamegraph out of that.