0

I am looking at this output of perf report -g.

  Children      Self   Command  Shared Object       Symbol
-   87.46%      3.94%  my_cmd   libgomp.so.1.0.0    [.] omp_in_final                                                           
  + 83.53% omp_in_final                                                                                                     
  + 2.81% clone                                                                                                              
  + 1.12% _start

Based on the perf wiki, my understanding is that the 3.94% and the 87.46% values shown mean that 3.94% of the samples taken occurred directly in the omp_in_final function, while 87.46% of the samples occurred either within the omp_in_final function or a function called by omp_in_final.

What isn't clear to me is what the 83.53% value above is indicating. Since the symbol associated with this value is also omp_in_final, I am not sure why that value is different from the 87.46% value above. Can anyone offer help about why these values are different?

Also, since the graph is in "caller" mode, I think this means that the output is indicating that omp_in_final is calling clone and start_. But, if this is the case, then what are the symbols that are shown underneath omp_in_final when I expand it? Are these functions that called omp_in_final?

-   87.46%     3.94%  my_cmd    libgomp.so.1.0.0    [.] omp_in_final                                                           
   - 83.53% omp_in_final                                                                                                      
        38.80% omp_get_num_procs                                                                                              
      + 21.33% do_manual_omp_version._omp_fn.0                                                                                
      + 3.61% do_bestcase_omp_version._omp_fn.0                                                                               
      + 3.39% do_omp_barrier_only._omp_fn.0

   .
   .
   .

I have looked at the perf report examples on the perf wiki, and I've come across some helpful examples (like this one) that cover other cases. But I haven't yet seen a case like the one above - where the omp_in_final has a symbol under it that is also omp_in_final but with a different percentage next to it.

plafratt
  • 738
  • 1
  • 8
  • 14

1 Answers1

0

After staring at this for a while, I think I might have gotten some insight into what the values mean.

Looking at this level (using different data, because I no longer have the perf.data file from the original run),

-   87.44%     3.69%  my_cmd    libgomp.so.1.0.0    [.] omp_in_final                                                           
   + 83.76% omp_in_final                                                                                                      
   + 2.71% clone                                                                                                              
   + 0.98% _start 

I think the clone and _start lines are showing that, of the 3.69% of samples recorded within omp_in_final, 2.71% were reached in a stack trace from clone to omp_in_final, and 0.98% were reached in a stack trace from _start to omp_in_final.

This becomes evident upon expanding those subtrees.

-   87.44%     3.69%  my_cmd    libgomp.so.1.0.0    [.] omp_in_final                                                           
   + 83.76% omp_in_final                                                                                                      
   - 2.71% clone                                                                                                              
        start_thread                                                                                                          
        omp_in_final                                                                                                          
   - 0.98% _start                                                                                                             
        __libc_start_main                                                                                                     
      - main                                                                                                                  
         - 0.52% GOMP_parallel                                                                                                
              0.50% omp_in_final

From that, we can see that clone and _start both call functions that eventually lead to omp_in_final.

In contrast, the 83.76% associated with the omp_in_final subtree I believe is showing the stack traces outside of that 3.69% - that is, in functions called by omp_in_final. Together, the 83.76% and the 3.69% (=2.71%+0.98%) constitute all of the 87.44% spent within omp_in_final (both directly in the function itself and in functions called by it).

The subtree of omp_in_final in this case looks like this.

-   87.44%     3.69%  my_cmd    libgomp.so.1.0.0    [.] omp_in_final                                                           
   - 83.76% omp_in_final                                                                                                      
        37.82% omp_get_num_procs                                                                                              
      + 21.46% do_manual_omp_version._omp_fn.0                                                                                
      + 3.64% do_bestcase_omp_version._omp_fn.0                                                                               
      + 3.41% do_omp_barrier_only._omp_fn.0                                                                                   
        1.30% dynamic_omp_module4._omp_fn.1                                                                                   
        1.29% dynamic_omp_module4._omp_fn.2                                                                                   
        1.28% dynamic_omp_module1._omp_fn.0                                                                                   
        1.27% dynamic_omp_module3._omp_fn.0                                                                                   
        1.26% dynamic_omp_module2._omp_fn.1                                                                                   
        1.25% dynamic_omp_module4._omp_fn.0                                                                                   
        1.25% dynamic_omp_module2._omp_fn.0                                                                                   
        1.24% dynamic_omp_module4._omp_fn.3                                                                                   
        1.24% dynamic_omp_module3._omp_fn.2                                                                                   
        1.23% dynamic_omp_module3._omp_fn.1

To me, it is a little confusing at first how the output shows the 83.76%, 2.71%, the 0.98% all with the same alignment and formatting, as if these numbers have the same meaning. But, as I read it, I think the subtrees under the 2.71% and 0.98% values are showing stack traces leading up to omp_in_final, whereas the 83.76% subtree is showing the stack traces that start at omp_in_final.

plafratt
  • 738
  • 1
  • 8
  • 14