1

I have decided to compare the times of passing by value and by reference in C++ (g++ 5.4.0) with the following code:

#include <iostream>
#include <sys/time.h>
using namespace std;

int fooVal(int a) {
    for (size_t i = 0; i < 1000; ++i) {
        ++a;
        --a;
    }
    return a;
}
int fooRef(int & a) {
    for (size_t i = 0; i < 1000; ++i) {
        ++a;
        --a;
    }
    return a;
}

int main() {
    int a = 0;
    struct timeval stop, start;
    gettimeofday(&start, NULL);
    for (size_t i = 0; i < 10000; ++i) {
        fooVal(a);
    }
    gettimeofday(&stop, NULL);
    printf("The loop has taken %lu microseconds\n", stop.tv_usec - start.tv_usec);
    gettimeofday(&start, NULL);
    for (size_t i = 0; i < 10000; ++i) {
        fooRef(a);
    }
    gettimeofday(&stop, NULL);
    printf("The loop has taken %lu microseconds\n", stop.tv_usec - start.tv_usec);
    return 0;
}

It was expected that the fooRef execution would take much more time in comparison with fooVal case because of "looking up" referenced value in memory while performing operations inside fooRef. But the result proved to be unexpected for me:

The loop has taken 18446744073708648210 microseconds
The loop has taken 99967 microseconds

And the next time I run the code it can produce something like

The loop has taken 97275 microseconds
The loop has taken 99873 microseconds

Most of the time produced values are close to each other (with fooRef being just a little bit slower), but sometimes outbursts like in the output from the first run can happen (both for fooRef and fooVal loops).

Could you please explain this strange result?

UPD: Optimizations were turned off, O0 level.

  • 5
    OMG! Did you really wait for `18446744073708648210` microseconds?? – WhiZTiM Mar 04 '17 at 16:14
  • To @WhiZTiM Of course not. It runs really fast as it should. But the data is very confusing. –  Mar 04 '17 at 16:17
  • did you turn on optimizations? – phuclv Mar 04 '17 at 16:20
  • To @Lưu Vĩnh Phúc No, optimizations were turned off, O0 level –  Mar 04 '17 at 16:22
  • 1
    as always, benchmarking unoptimized builds are pointless – phuclv Mar 04 '17 at 16:25
  • 2
    On a side note: you [shouldn't be using `gettimeoftheday`](http://stackoverflow.com/questions/5362577/c-gettimeofday-for-computing-time#comment28256531_5362577) for benchmarking though. – WhiZTiM Mar 04 '17 at 16:26
  • this is what an optimized version do https://godbolt.org/g/tebTkp – phuclv Mar 04 '17 at 16:30
  • To compare "apples to apples", you need to make the second function pass by *constant reference*. Passing by non-const reference is more like passing by pointer. Passing by reference tells the compiler you will modify the variable that was passed to the function; which is not true when passing by value (copy). – Thomas Matthews Mar 04 '17 at 17:40
  • `std::chrono::steady_clock` is your friend here. Though the cache effects mentioned in an answer below will still dominate. For fun, reorder your tests in `main`. – Howard Hinnant Mar 04 '17 at 18:43

3 Answers3

1

If gettimeofday() function relies on operating system clock, this clock is not really designed for dealing with microseconds in an accurate manner. The clock is typically updated periodically and only frequently enough to give the appearance of showing seconds accurately for the purpose of working with date/time values. Sampling at the microsecond level may be unreliable for a benchmark such as the one you are performing.

You should be able to work around this limitation by making your test time much longer; for example, several seconds.

Again, as mentioned in other answers and comments, the effects of which type of memory is accessed (register, cache, main, etc.) and whether or not various optimizations are applied, could substantially impact results.

As with working around the time sampling limitation, you might be able to somewhat work around the memory type and optimization issues by making your test data set much larger such that memory optimizations aimed at smaller blocks of memory are effectively bypassed.

Tim D
  • 650
  • 1
  • 12
  • 18
1

Firstly, you should look at the assembly language to see if there are any differences between passing by reference and passing by value.

Secondly, make the functions equivalent by passing by constant reference. Passing by value says that the original variable won't be changed. Passing by constant reference keeps the same principle.

My belief is that the two techniques should be equivalent in both assembly language and performance.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
0

I'm no expert in this area, but I would tend to think that the reason why the two times are somewhat equivalent is due to cache memory.

When you need to access a memory location (Say, address 0xaabbc125 on an IA-32 architecure), the CPU copies the memory block (addresses 0xaabbc000 to 0xaabbcfff) to your cache memory. Reading from and writing to the memory is very slow, but once it's been copied into you cache, you can access values very quickly. This is useful because programs usually require the same range of addresses over and over.

Since you execute the same code over and over and that your code doesn't require a lot of memory, the first time the function is executed, the memory block(s) is (are) copied to your cache once, which probably takes most of the 97000 time units. Any subsequent calls to your fooVal and fooRef functions will require addresses that are already in your cache, so they will require only a few nanoseconds (I'd figure roughly between 10ns and 1µs). Thus, dereferencing the pointer (since a reference is implemented as a pointer) is about double the time compared to just accessing a value, but it's double of not much anyway.

Someone who is more of an expert may have a better or more complete explanation than mine, but I think this could help you understand what's going on here.

A little idea : try to run the fooVal and fooRef functions a few times (say, 10 times) before setting start and beginning the loop. That way, (if my explanation was correct!) the memory block will (should) be already into cache when you begin looping them, which means you won't be taking caching in your times.

About the super-high value you got, I can't explain that. But the value is obviously wrong.

It's not a bug, it's a feature! =)

adentinger
  • 1,367
  • 1
  • 14
  • 30
  • Thank you for your reply. Caching seems really could equalize the timing. Though the idea about running `fooVal` and `fooRef` a few times before measuring failed unfortunately - the result is the same as before. –  Mar 04 '17 at 17:06