I'm trying to profile a program that performs IO operations (such as network device). Here are some certain facts:
1) My workload generator saturates the request.
2) My IO device is not fully utilized (around 70%)
3) top shows that there my CPU core is having idle time of 30%.
What I don't understand is that since the request is saturated at the workload generator, if there is available CPU resource and device resource, shouldn't it be fully utilized?
The source code that I'm working on is extremely large so need to analyze the hot-path. Currently, all I can think of the possible reason is:
A) There maybe some "wait" routine in the hot-path.
Because, by the fact that CPU idle time is around 30%, the hot-path is not always executed on CPU core.
NOTE: Based on the observation on each CPU core, there is no bottleneck in a core, so the problem is not about load-balancing. All cores are in idle with 30%.
I wish if some of you may have similar experience and could give me some hint for constructing my hypothesis for the bottleneck analysis.