33

In an ACM example, I had to build a big table for dynamic programming. I had to store two integers in each cell, so I decided to go for a std::pair<int, int>. However, allocating a huge array of them took 1.5 seconds:

std::pair<int, int> table[1001][1001];

Afterwards, I have changed this code to

struct Cell {
    int first;
    int second;
}

Cell table[1001][1001];

and the allocation took 0 seconds.

What explains this huge difference in time?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Etan
  • 17,014
  • 17
  • 89
  • 148

4 Answers4

41

std::pair<int, int>::pair() constructor initializes the fields with default values (zero in case of int) and your struct Cell doesn't (since you only have an auto-generated default constructor that does nothing).

Initializing requires writing to each field which requires a whole lot of memory accesses that are relatively time consuming. With struct Cell nothing is done instead and doing nothing is a bit faster.

sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • 4
    does it take 1.5 seconds to set about 8 MB's to zero? – Etan Oct 22 '09 at 12:37
  • 8
    No, the problem is the function calls to the constructor. If your array is global, it's initialized to zero anyway. Probably the compiler doesn't optimize out the constructor calls. – Johannes Schaub - litb Oct 22 '09 at 12:38
  • 2
    You need to call the constructor, find the memory, set it, etc... it's not just setting N bits of contigous memory to 0 via memset. – Aye Oct 22 '09 at 12:40
  • 2
    @Lemurik: I’m not convinced. All the `pair` constructor does is delegating to the constructors of its (POD) members. A compiler could (and IMHO, *should*) recognize this, elide the constructor call and simply zero out the memory. The analysis to do this surely cannot be hard? – Konrad Rudolph Oct 24 '09 at 09:14
  • … alas, it doesn’t (just tested it). :-( – Konrad Rudolph Oct 24 '09 at 09:21
  • 1
    Are we 100% sure the use of `std::pair` is to blame for the performance? I tend to use pair when it makes for more readable code but this is making me reconsider.... – Steven Lu Mar 24 '12 at 06:47
  • Have you tried your class with a default constructor `Cell(): first(), second() {}` ? It should take the same 1.5s, if the constructor is to blame. – user666412 Nov 24 '14 at 13:37
  • Please note that since C++11 both `first` and `second` of `struct Cell` will be zero initialised precisely because the constructor is auto generated. – Killzone Kid Aug 14 '18 at 22:05
29

The answers so far don't explain the full magnitude of the problem.

As sharptooth has pointed out, the pair solution initializes the values to zero. As Lemurik pointed out, the pair solution isn't just initializing a contiguous block of memory, instead it is calling the pair constructor for every element in the table. However, even that doesn't account for it taking 1.5 seconds. Something else is happening.

Here's my logic:

Assuming you were on an ancient machine, say running at 1.33ghz, then 1.5 seconds is 2e9 clock cycles. You've got 2e6 pairs to construct, so somehow each pair constructor is taking 1000 cycles. It doesn't take 1000 cycles to call a constructor that just sets two integers to zero. I can't see how cache misses would make it take that long. I would believe it if the number was less than 100 cycles.

I thought it would be interesting to see where else all these CPU cycles are going. I used the crappiest oldest C++ compiler I could find to see if I could attain the level of wastage required. That compiler was VC++ v6. In debug mode, it does something I don't understand. It has a big loop that calls the pair constructor for each item in the table - fair enough. That constructor sets the two values to zero - fair enough. But just before doing that, it sets all the bytes in a 68 byte region to 0xcc. That region is just before the start of the the big table. It then overwrites the last element of that region with 0x28F61200. Every call of the pair constructor repeats this. Presumably this is some kind of book keeping by the compiler so it knows which regions are initialized when checking for pointer errors at run time. I'd love to know exactly what this is for.

Anyway, that would explain where the extra time is going. Obviously another compiler may not be this bad. And certainly an optimized release build wouldn't be.

Andrew Bainbridge
  • 4,651
  • 3
  • 35
  • 50
  • 6
    This is not the fault of VC++ V6 per say. In debug mode all VC compilers will initialize all bytes allocated on the stack to a trap value (0xCC by default) and all heap memory allocated will be similarly initialized to 0xCD. The purpose is two fold: Cause any code that is operating on (zero-assumed) uninitialized variables to fail loudly, and let you see un-initialized stack in the memory debugger. The last value you see written is a "stack canary value" used to detect stack overflow(.com *giggle*)s. – Emily L. Mar 31 '14 at 15:25
3

These are all very good guesses, but as everyone knows, guesses are not reliable.

I would say just randomly pause it within that 1.5 seconds, but you'd have to be pretty quick. If you increased each dimension by a factor of about 3, you could make it take more like 10+ seconds, so it would be easier to pause.

Or, you could get it under a debugger, break it in the pair constructor code, and then single step to see what it is doing.

Either way, you would get a firm answer to the question, not just a guess.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
1

My guess it's the way std::pair is created. There is more overhead during invoking a pair constructor 1001x1001 times than when you just allocate a memory range.

Ashalynd
  • 12,363
  • 2
  • 34
  • 37