Boost tuple performance

Question

According to boost::tuple documentation, accessing a single element of a tuple has the same performance as accessing a member variable. For example, given the following declaration:

tuple<A, B, C> t1(A(), B(), C());
struct T { A a; B b; C c; }
T t2;

These two statements should have equal (or with negligible difference) performance:

t1.get<2>();
t2.c;

I looked into the sources of boost::tuple and, if I understood them correctly (I am not sure I did), get<N> function actually performs this action:

C get<2>(tuple<A, B, C>& t)
{
    return t.tail.tail.head;
    //Generally:  return t.tail. <<N times>> .head;
}

This is more similar to a look-up in a linked list than a direct access, and, as far as I undestand, has O(N) complexity instead of O(1), which is expected from a member access. From my past experiences with boost, I'd assume I got it wrongly; but what is my mistake? How does get really work?

I'm guessing this relies heavily on compile time optimization — Bwmat, Nov 29 '10 at 08:36

score 6 · Accepted Answer · answered Nov 29 '10 at 08:35

6

You are correct about the list-like performance. However, it can be resolved at compile-time and hence boils down to O(1) at run-time. (Given a sufficiently good optimizing compiler.)

answered Nov 29 '10 at 08:35

ltjax

15,837
3
39
62

You mean, if I traverse the "tails" in a for loop, it is run time computed, but if I just write `t.tail.tail.tail.tail.tail.tail.tail.head` then modern compilers can optimize it to a direct access to the final item? Is there an easy way to know what my compiler does with it? – FireAphis Nov 29 '10 at 08:40
Which compilers actually do perform that optimization? – Crashworks Nov 29 '10 at 08:40
A loop can be as good if it can be properly unrolled by the optimizer. Note that you are dealing with a structure that is compile-time constant here, not an ordinary dynamic data structure. – ltjax Nov 29 '10 at 08:48
To find out whether your compiler optimizes the chained member-access away, you could look at the assembler code that is generated. – ltjax Nov 29 '10 at 08:48
5

I just checked. MSVC does collapse the chained access into one op (for an eight-member tuple, anyway). You can check with the "assembly and source code" property of the Output Files section of the project properties dialog. I think giving GCC the -S option does something similar. – Crashworks Nov 29 '10 at 09:03
@FireAphis Even if you did it in a for loop, if the number of iterations is a compile-time constant, and there are no other side effects, your compiler should boil it down to direct access of the requested object – KitsuneYMG Nov 29 '10 at 15:56
Accessing a member from a variable a is effectively something like *(&a + offset), where the offset is a compile time constant, depending the sizes of other member variables of "a". I can't imagine why a compiler can't add a bunch of offsets to create a single offset. So my claim is ALL commercial compiler would support this optimization. – Parakram Majumdar Nov 08 '13 at 12:39

score 3 · Answer 2 · answered Jul 15 '11 at 18:50

Remember, in C++, the dot operator is not a pointer deference, it is a direct offset computation. The general answer is yes, i1.i2.i3.in for all n is a constant time operation calculable at compile time.

If you want to learn a little about the compiler internals for this without digging very deep, look at LLVM getelementptr http://llvm.org/docs/LangRef.html#i_getelementptr This is exactly how a C++ compiler like CLANG would target LLVM when compiling a struct reference.

Boost tuple performance

2 Answers2