For some context, I'm profiling the execution of Memcached, and I would like to monitor dTLB misses during the execution of a specific function. Assuming that Memcached spawns multiple threads, each thread could potentially be executing the function in parallel. One particular solution I discovered, Perf features toggle events (Using perf probe to monitor performance stats during a particular function), should let me achieve this by setting probes on function entry and exit and toggling the event counter on/off on each probe respectively.
My question is:
(a) From my understanding, perf toggle events was included as part of a branch to Linux kernel 3.x. Has this been incorporated in recent LTS releases of Linux kernel 4.x? If not, are there any other alternatives?
(b) Another workaround I found is described here: performance monitoring for subset of process execution. However I'm not too sure if this will work correctly for the problem at hand. I'm concerned since Memcached is multi-threaded, having each thread spawn a new child process may cause too much overhead.
Any suggestions?