3

I have the following class:

template<ItType I, LockType L>
class ArcItBase;

with a (one of them) constructor:

ArcItBase ( StableRootedDigraph& g_, Node const n_ ) noexcept :
  srd ( g_ ), 
  arc ( I == ItType::in 
           ? srd.nodes [ n_ ].head_in 
             : srd.nodes [ n_ ].head_out ) { }

The question is (which I don't see how to test) whether the value of the expression for the constructor of arc will be determined at compile-time or at run-time (Release, full optimization, clang-cl and VC14), given that I == ItType::in can be evaluated (is known, I is either ItType::in or ItType::out) at compile-time to either true or false?

g24l
  • 3,055
  • 15
  • 28
degski
  • 642
  • 5
  • 13
  • 4
    You can always inspect the generated assembler code to check this. – matb Nov 09 '15 at 10:40
  • 2
    @matb Thanks for your interest, but 27000+ lines of assembler (and referenced code) seems like seeking for a needle in a hay-stack to me... – degski Nov 09 '15 at 11:05
  • Then build a minimal example of that and examine thats assemblercode... – Nidhoegger Nov 09 '15 at 11:38
  • 2
    given the number of moving parts in this constructor i think you're going to have to do a lot of work to get it to be evaluated at compile time. At least the constructors of ArcItBase and `srd` will need to be `constexpr`, as will `srd`'s `operator[]`. – Richard Hodges Nov 09 '15 at 11:46
  • @RichardHodges The `operator[]` is a wrapper over tbb::concurrent_vector. So I guess I have to specialize the base-class to be able to get this to work at compile-time... Thanks! – degski Nov 09 '15 at 12:00
  • 2
    You could use http://gcc.godbolt.org/# to quickly generate assembly code. – Simon Kraemer Nov 09 '15 at 12:14
  • @SimonKraemer Compilers (as per question) specified clang-cl and VC14... – degski Nov 09 '15 at 12:25
  • @degski At least clang is available in multiple versions. – Simon Kraemer Nov 09 '15 at 12:32
  • 1
    If the compiler can prove something is a constant at compiler then it can optimize it to a constant, see [this](http://stackoverflow.com/a/26949631/1708801) for an example. Whether it will or not will vary, you have to look at the assembly. If the example is too big for godbolt then compiler should give you the ability to produce more readable assembly see [this for gcc examples, likely applicable to clang](http://stackoverflow.com/q/1289881/1708801) which should allow you to find your needle more effectively. – Shafik Yaghmour Nov 09 '15 at 13:04
  • @ ShafikYaghmour "Whether it will or not will vary", that's the question isn't it... I have no problem producing assembler, no need for god. Even though I could minimalise the code (it's already pretty minimal) a bit more, it's alot of assembler, and in reality, I don't know what to look for, I program in C++, not in assembler. – degski Nov 09 '15 at 13:14
  • If the compiler knows a value at compile time, it will generally [fold](https://en.wikipedia.org/wiki/Constant_folding) it into any expressions that depend on it. It's an optimization though, so nothing requires the compiler to do it. Verifying it would mean reading the assembly. – Jason Nov 09 '15 at 23:52

1 Answers1

2

It is not possible to have your code compiling without knowing the ItType at compile time.

The template parameter is evaluated at compile time and the conditional is a core constant expression, standard reference is C++11 5.19/2.

In the contrasting case the compiler would have to generate code that is equivalent to

arc(true ? : )

Which if you would actually write it would be optimized. However the rest of the conditional will not be optimized since you are accessing a what seems to be a non static member and cannot be evaluated as a core constant expression.

However, compilers may not always work as we expect so if you would actually want to test this you should dump the disassembled object file

objdump -DS file.o

and then you can better navigate the output.

Another option would be to launch the debugger and inspect the code.

Don't forget that you can always have your symbols even in case of optimizing, e.g.

g++ -O3 -g -c foo.cpp

Below you will find a toy implementation . In the first case values are given to the constructor of arcbase is called as:

arcbase<true> a(10,9);

Whereas in the second it is given non const random values that cannot be known at compile time.

After compiling with g++ --stc=c++11 -c -O3 -g the first case creates:

Disassembly of section .text._ZN7arcbaseILb1EEC2Eii:

0000000000000000 <arcbase<true>::arcbase(int, int)>:
        srd isrd;

        arc iarc; 

        public:
        arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 10             sub    $0x10,%rsp
   8:   48 89 7d f8             mov    %rdi,-0x8(%rbp)
   c:   89 75 f4                mov    %esi,-0xc(%rbp)
   f:   89 55 f0                mov    %edx,-0x10(%rbp)
  12:   48 8b 45 f8             mov    -0x8(%rbp),%rax
  16:   8b 55 f0                mov    -0x10(%rbp),%edx
  19:   8b 4d f4                mov    -0xc(%rbp),%ecx
  1c:   89 ce                   mov    %ecx,%esi
  1e:   48 89 c7                mov    %rax,%rdi
  21:   e8 00 00 00 00          callq  26 <arcbase<true>::arcbase(int, int)+0x26>
  26:   48 8b 45 f8             mov    -0x8(%rbp),%rax
  2a:   8b 00                   mov    (%rax),%eax
  2c:   48 8b 55 f8             mov    -0x8(%rbp),%rdx
  30:   48 83 c2 08             add    $0x8,%rdx
  34:   89 c6                   mov    %eax,%esi
  36:   48 89 d7                mov    %rdx,%rdi
  39:   e8 00 00 00 00          callq  3e <arcbase<true>::arcbase(int, int)+0x3e>
  3e:   c9                      leaveq 
  3f:   c3                      retq  

Whereas the second case:

Disassembly of section .text._ZN7arcbaseILb1EEC2Eii:

0000000000000000 <arcbase<true>::arcbase(int, int)>:
        srd isrd;

        arc iarc; 

        public:
        arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
   0:   53                      push   %rbx
   1:   48 89 fb                mov    %rdi,%rbx
   4:   e8 00 00 00 00          callq  9 <arcbase<true>::arcbase(int, int)+0x9>
   9:   48 8d 7b 08             lea    0x8(%rbx),%rdi
   d:   8b 33                   mov    (%rbx),%esi
   f:   5b                      pop    %rbx
  10:   e9 00 00 00 00          jmpq   15 <arcbase<true>::arcbase(int, int)+0x15>

Looking at the dissasembly you should notice that even in the first case the value of 10 is not directly passed as is to the constructor, but instead only placed in the register from where is is retrieved.

Here is the output from gdb :

0x400910 <_ZN3arcC2Ei>                  mov    %esi,(%rdi)                                                      
0x400912 <_ZN3arcC2Ei+2>                retq                                                                    
0x400913                                nop                                                                     
0x400914                                nop                                                                     
0x400915                                nop                                                                     
0x400916                                nop                                                                     
0x400917                                nop                                                                     
0x400918                                nop                                                                     
0x400919                                nop                                                                     
0x40091a                                nop                                                                     
0x40091b                                nop                                                                     
0x40091c                                nop                                                                     
0x40091d                                nop                                                                     
0x40091e                                nop                                                                     
0x40091f                                nop                                                                     
0x400920 <_ZN7arcbaseILb1EEC2Eii>       push   %rbx                                                             
0x400921 <_ZN7arcbaseILb1EEC2Eii+1>     mov    %rdi,%rbx                                                        
0x400924 <_ZN7arcbaseILb1EEC2Eii+4>     callq  0x400900 <_ZN3srdC2Eii>                                          
0x400929 <_ZN7arcbaseILb1EEC2Eii+9>     lea    0x8(%rbx),%rdi                                                   
0x40092d <_ZN7arcbaseILb1EEC2Eii+13>    mov    (%rbx),%esi                                                      
0x40092f <_ZN7arcbaseILb1EEC2Eii+15>    pop    %rbx                                                             
0x400930 <_ZN7arcbaseILb1EEC2Eii+16>    jmpq   0x400910 <_ZN3arcC2Ei>

The code for the second case is :

struct llist
{
  int head_in;
  int head_out;

  llist(int a , int b ) : head_in(a), head_out(b) {}
};

struct srd
{
  llist nodes;
  srd(int a, int b) : nodes(a,b) {}
};

struct arc
{
  int y;
  arc( int x):y(x) {}
};


template< bool I > class arcbase
{
  srd isrd;

  arc iarc;

  public:
  arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}

  void print()
  {
    std::cout << iarc.y << std::endl;
  }

};


int main(void)
{

  std::srand(time(0));

  volatile int a_ = std::rand()%100;
  volatile int b_ = std::rand()%4;

  arcbase<true> a(a_,b_);

  a.print();

  return 0;
}
g24l
  • 3,055
  • 15
  • 28
  • ! When you write: "However the rest of the conditional will not be optimized...". Do you mean to say that one of the two, `srd.nodes [ n_ ].head_in` or `srd.nodes [ n_ ].head_out` will be "substituted" at compile time, but that those will not be optimised themselves? – degski Nov 10 '15 at 06:26
  • @degski What will happen to srd.nodes is dependent how they are filled with values. Assuming that they are sufficiently non-complex and their values are not runtime dependent then they will be optimized out. I will update the post with some new information to display this. However, I would not expect any optimization apart from what is described above. – g24l Nov 10 '15 at 11:07
  • 1
    The values are run-time dependent. I was not expecting them to be optimised out, just the selection of `srd.nodes [ n_ ].head_in` or `srd.nodes [ n_ ].head_out`. Thanks for all the work you've put in, I've accepted your answer as the right one... – degski Nov 10 '15 at 12:29