0

Test Code:

#include <array>

int test(const std::array<int, 10> &arr) {
    return arr[9];
}

I want to make arr[0] as efficient as C style array, which means inline STL array [] operator function.

I have checked the generate assembly code:

$ g++ --std=c++17  -c test.cpp && objdump -d -C test.o

test.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <test(std::array<int, 10ul> const&)>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 10             sub    $0x10,%rsp
   8:   48 89 7d f8             mov    %rdi,0xfffffffffffffff8(%rbp)
   c:   48 8b 45 f8             mov    0xfffffffffffffff8(%rbp),%rax
  10:   be 09 00 00 00          mov    $0x9,%esi
  15:   48 89 c7                mov    %rax,%rdi
  18:   e8 00 00 00 00          callq  1d <test(std::array<int, 10ul> const&)+0x1d>
  1d:   8b 00                   mov    (%rax),%eax
  1f:   c9                      leaveq
  20:   c3                      retq
Disassembly of section .text._ZNKSt5arrayIiLm10EEixEm:

0000000000000000 <std::array<int, 10ul>::operator[](unsigned long) const>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 10             sub    $0x10,%rsp
   8:   48 89 7d f8             mov    %rdi,0xfffffffffffffff8(%rbp)
   c:   48 89 75 f0             mov    %rsi,0xfffffffffffffff0(%rbp)
  10:   48 8b 45 f8             mov    0xfffffffffffffff8(%rbp),%rax
  14:   48 8b 55 f0             mov    0xfffffffffffffff0(%rbp),%rdx
  18:   48 89 d6                mov    %rdx,%rsi
  1b:   48 89 c7                mov    %rax,%rdi
  1e:   e8 00 00 00 00          callq  23 <std::array<int, 10ul>::operator[](unsigned long) const+0x23>
  23:   c9                      leaveq
  24:   c3                      retq
Disassembly of section .text._ZNSt14__array_traitsIiLm10EE6_S_refERA10_Kim:

0000000000000000 <std::__array_traits<int, 10ul>::_S_ref(int const (&) [10], unsigned long)>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 89 7d f8             mov    %rdi,0xfffffffffffffff8(%rbp)
   8:   48 89 75 f0             mov    %rsi,0xfffffffffffffff0(%rbp)
   c:   48 8b 45 f0             mov    0xfffffffffffffff0(%rbp),%rax
  10:   48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
  17:   00
  18:   48 8b 45 f8             mov    0xfffffffffffffff8(%rbp),%rax
  1c:   48 01 d0                add    %rdx,%rax
  1f:   5d                      pop    %rbp
  20:   c3                      retq

arr[9] is a function call 1d <test(std::array<int, 10ul> const&)+0x1d> in generated code.

If I specified the optimization level, the STL function is inlined as expect:

$ g++ --std=c++17 -Og -c test.cpp && objdump -d -C test.o

test.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <test(std::array<int, 10ul> const&)>:
   0:   8b 47 24                mov    0x24(%rdi),%eax
   3:   c3                      retq

But my real project is a big project, I can't change the global compile optimization flag. So I want to specified the optimization flag for some files.

So I add #pragma GCC optimize ("string"...) in my program:

#pragma GCC optimize ("-Og")

#include <array>

int test(const std::array<int, 10> &arr) {
    return arr[9];
}

This option make some sense indeed:

$ g++ --std=c++17 -c test.cpp && objdump -d -C test.o

test.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <test(std::array<int, 10ul> const&)>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   be 09 00 00 00          mov    $0x9,%esi
   9:   e8 00 00 00 00          callq  e <test(std::array<int, 10ul> const&)+0xe>
   e:   8b 00                   mov    (%rax),%eax
  10:   48 83 c4 08             add    $0x8,%rsp
  14:   c3                      retq
Disassembly of section .text._ZNKSt5arrayIiLm10EEixEm:

0000000000000000 <std::array<int, 10ul>::operator[](unsigned long) const>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   e8 00 00 00 00          callq  9 <std::array<int, 10ul>::operator[](unsigned long) const+0x9>
   9:   48 83 c4 08             add    $0x8,%rsp
   d:   c3                      retq
Disassembly of section .text._ZNSt14__array_traitsIiLm10EE6_S_refERA10_Kim:

0000000000000000 <std::__array_traits<int, 10ul>::_S_ref(int const (&) [10], unsigned long)>:
   0:   48 8d 04 b7             lea    (%rdi,%rsi,4),%rax
   4:   c3                      retq

The <std::array<int, 10ul>::operator[] & __array_traits functions ard optimized, but we can see, there still a function call: callq e <test(std::array<int, 10ul> const&)+0xe>.

So I wonder why #pragma GCC optimize ("-Og") does take effect as I expect. And I want to know how to force inline STL function for a specified file?


Note: GCC version: 8.2

  • According to the comment [here](https://stackoverflow.com/a/47222208/9545074), the `#pragma` optimization is not fully equivalent to passing the optimization level on the command line – Mestkon Feb 14 '22 at 13:23
  • 5
    Try telling the compiler to optimize `-O3` on the command line. Non-optimized code is for debugging and leaving function not inline makes this easier. See generated code here - https://godbolt.org/z/3q3hre4Wj – Richard Critten Feb 14 '22 at 13:24
  • 5
    "But my real project is a big project, I can't change the global compile optimization flag. " why can you not change the flag? Building a production build without optimizations enabled is not recommended – 463035818_is_not_an_ai Feb 14 '22 at 13:25
  • I wonder if this is an [XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). The desire to specify an optimization level for a project is common, but attempting to do so with `#pragma`s in the code because _"my real project is a big project"_ is not a familiar concept to me. `#pragma GCC optimize ("-Og")` could raise the optimization level, but it could also lower it. – Drew Dormann Feb 14 '22 at 13:44
  • @463035818_is_not_a_number Because our code is very old, more than 10 years... Due to unknown reason, optimization is not enabled. So my leader do not allow me to change the compile flag for legacy code. – Mingfei Gao Feb 14 '22 at 13:57
  • @DrewDormann See comment above, optimization is not enabled for the whole project, and I can not change it... – Mingfei Gao Feb 14 '22 at 13:59
  • 2
    (gently) Hit your project leader round the head with a good c++ book until they enable optimisations. If they won't enable optimisations then you don't need to worry about the performance of your code as the application as a whole will be really slow – Alan Birtles Feb 14 '22 at 14:09
  • 6
    This scenario seems more familiar to me now. I am _guessing_ that this old project exhibits Undefined Behavior. And the team has discovered that the Undefined Behavior is closer to _desired behavior_ when optimizations are disabled. Now, we need to optimize, so we are figuring out how to circumvent team rules. – Drew Dormann Feb 14 '22 at 14:18
  • Compile the whole project with `-fsanitize=undefined`, then fix those bugs and see if that lets you safely enable optimizations for the whole project. Anti-optimized debug-mode code is [highly inefficient](https://stackoverflow.com/questions/53366394/why-does-clang-produce-inefficient-asm-with-o0-for-this-simple-floating-point), storing / reloading everything between statements instead of keeping values in registers. (Unless you use `register int foo;` which is deprecated in modern C++ because it's assumed you'll compile with optimization.) – Peter Cordes Feb 14 '22 at 19:00

1 Answers1

6

#pragma GCC optimize ("Og") doesn't enable inlining by default because the default when not optimizing is -fno-inline.

Use #pragma GCC optimize ("Og,inline") to enable inlining.

rustyx
  • 80,671
  • 25
  • 200
  • 267
  • Or better, if the new code is actually safe (unlike the legacy code that apparently motivates leaving optimization disabled), you can enable full optimization for it with `O3,inline`. – Peter Cordes Feb 14 '22 at 19:01