2

I'm investigating the execution flow of a OpenMP program linked to libgomp. It uses the #pragma omp parallel for. I already know that this construct becomes, among other things, a call to GOMP_parallel function, which is implemented as follows:

void
GOMP_parallel (void (*fn) (void *), void *data, 
               unsigned num_threads, unsigned int flags)
{
   num_threads = gomp_resolve_num_threads (num_threads, 0);
   gomp_team_start (fn, data, num_threads, flags, gomp_new_team (num_threads));
   fn (data);
   ialias_call (GOMP_parallel_end) ();
}

When executing objdump -d on libgomp, GOMP_parallel appears as:

000000000000bc80 <GOMP_parallel@@GOMP_4.0>:
bc80:   41 55                   push   %r13
bc82:   41 54                   push   %r12
bc84:   41 89 cd                mov    %ecx,%r13d
bc87:   55                      push   %rbp
bc88:   53                      push   %rbx
bc89:   48 89 f5                mov    %rsi,%rbp
bc8c:   48 89 fb                mov    %rdi,%rbx
bc8f:   31 f6                   xor    %esi,%esi
bc91:   89 d7                   mov    %edx,%edi
bc93:   48 83 ec 08             sub    $0x8,%rsp
bc97:   e8 d4 fd ff ff          callq  ba70 <GOMP_ordered_end@@GOMP_1.0+0x70>
bc9c:   41 89 c4                mov    %eax,%r12d
bc9f:   89 c7                   mov    %eax,%edi
bca1:   e8 ca 37 00 00          callq  f470 <omp_in_final@@OMP_3.1+0x2c0>
bca6:   44 89 e9                mov    %r13d,%ecx
bca9:   44 89 e2                mov    %r12d,%edx
bcac:   48 89 ee                mov    %rbp,%rsi
bcaf:   48 89 df                mov    %rbx,%rdi
bcb2:   49 89 c0                mov    %rax,%r8
bcb5:   e8 16 39 00 00          callq  f5d0 <omp_in_final@@OMP_3.1+0x420>
bcba:   48 89 ef                mov    %rbp,%rdi
bcbd:   ff d3                   callq  *%rbx
bcbf:   48 83 c4 08             add    $0x8,%rsp
bcc3:   5b                      pop    %rbx
bcc4:   5d                      pop    %rbp
bcc5:   41 5c                   pop    %r12
bcc7:   41 5d                   pop    %r13
bcc9:   e9 32 ff ff ff          jmpq   bc00 <GOMP_parallel_end@@GOMP_1.0>
bcce:   66 90                   xchg   %ax,%ax

First, there isn't any call to GOMP_ordered_end in the source code of GOMP_parallel, for example. Second, that function consists of:

void
GOMP_ordered_end (void)
{
}

According the the objdump output, this function starts at ba00 and finishes at bbbd. How could it have so much code in a function that is empty? By the way, there is comment in the source code of libgomp saying that it should appear only when using the ORDERED construct (as the name suggests), which is not the case of my test.

Finally, the main concern here for me is: why does the source code differ so much from the disassembly? Why, for example, isn't there any mention to gomp_team_start in the assembly?

The system has gcc version 5.4.0

Márcio Jales
  • 205
  • 1
  • 8
  • 1
    gcc inlines small functions. Some of those "functions" might even be CPP macros. – Peter Cordes Oct 18 '17 at 15:25
  • 3
    There isn't a call to `GOMP_ordered_end` in your `objdump -d ` output. Instead there's a call to `GOMP_ordered_end@@GOMP_1.0+0x70` which is an unnamed function that is located 0x70 bytes after the start of `GOMP_ordered_end@@GOMP_1.0`. – Ross Ridge Oct 18 '17 at 21:01
  • If you really suspect the compiled code is different from the source, the first step is to have a look at the code after C macro expansion (`gcc -E`). The next step is to look at the produced assembly code (`gcc -S`). – dirkt Oct 19 '17 at 09:38

1 Answers1

3

According the the objdump output, this function starts at ba00 and finishes at bbbd. How could it have so much code in a function that is empty?

The function itself is small but GCC just used some additional bytes to align the next function and store some static data (probly used by other functions in this file). Here's what I see in local ordered.o:

00000000000003b0 <GOMP_ordered_end>:
 3b0:   f3 c3                   repz retq
 3b2:   66 66 66 66 66 2e 0f    data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
 3b9:   1f 84 00 00 00 00 00

First, there isn't any call to GOMP_ordered_end in the source code of GOMP_parallel, for example.

Don't get distracted by GOMP_ordered_end@@GOMP_1.0+0x70 mark in assembly code. All it says is that this calls some local library function (for which objdump didn't find any symbol info) which happens to be located 112 bytes after GOMP_ordered_end. This is most likely gomp_resolve_num_threads.

Why, for example, isn't there any mention to gomp_team_start in the assembly?

Hm, this looks pretty much like it:

bcb5:   e8 16 39 00 00          callq  f5d0 <omp_in_final@@OMP_3.1+0x420>
yugr
  • 19,769
  • 3
  • 51
  • 96
  • Alignment isn't sufficient to explain `0xbd` bytes for an empty function. Something else must be going on. – Peter Cordes Oct 18 '17 at 15:27
  • @yugr I really meant "GOMP_ordered_end". What I was trying to say is: How did "GOMP_ordered_end" show up in the disassembly of "GOMP_parallel", but there isn't any call to that function there on the source code? – Márcio Jales Oct 18 '17 at 17:24
  • @PeterCordes True but it's not the meat of the question so I didn't bother to check... – yugr Oct 19 '17 at 06:55