6

I have just tried to benchmark std::sort on both std::vector<std::pair<float, unsigned int>> (filled with push_back operation) and plain std::pair<float, unsigned int>> * array (allocated using new and then filled one by one). The compare function just compared the float parts of the pairs.

Surprisingly, when used on 16M values, on std::vector it took only about 1940 ms but on the array it was about 2190 ms. Can anybody explain how can be the vector faster? Is it due to cacheing, or is just array version of std::sort poorly implemented?

gcc (GCC) 4.4.5 20110214 (Red Hat 4.4.5-6)

Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz - cache size 8192 KB (the computer has two quad-core CPUs but I assume the sort is only single-threaded)

EDIT: Now you can call me dumbass, but when I tried to reproduce the code I used for the measurements (I have already deleted the original one) I cannot reproduce the results - now the array version takes about 1915 +- 5ms (measured on 32 runs). I can only swear I have run the test on 10 measurements three times (manually) with similar results, but this is not a rigorous proof.

There was probably some bug in the original code, background process seems unprobable because I have alternated measurements of vector and array versions and the vector results hold and no user was logged in.

Please, consider this question as closed. Thanks for your efforts.

Radim Vansa
  • 5,686
  • 2
  • 25
  • 40
  • How many times did you run the test? one time is not stasticly accurate at all. – Yet Another Geek Jun 16 '11 at 13:37
  • You should provide the exact testing protocole you did follow. For example, if you didn't execute your tests more than one time for each, or if you did both tests in the same execution(s), then test isn't relevant. – Klaim Jun 16 '11 at 13:38
  • On my `i7 920 @ 2.67 GHz` with `gcc 4.5.2`, I get `770 - 780 ms` for the vector and `630 - 650 ms` for the array. What optimization options did you use? (I used `-march=native -O3`, array and vector contained identical 16000000 pairs of random numbers) – Cubbi Jun 16 '11 at 13:39
  • Were the input arrays sorted identically to begin with? How many other things is your machine doing? There are typically 100 threads on a machine, and if some of those are running it will affect performance. The difference there is only 10%. – Paul Beckingham Jun 16 '11 at 13:41
  • @Cubbi: Weird, too. GCC vectors use pointers as iterators when compiled with optimisation and not in debug mode. So the results should be *identical* and yours look significantly different. – Konrad Rudolph Jun 16 '11 at 13:51
  • 2
    @Konrad Given that he doesn't show us any of the code he's actually measuring, who knows? But the difference isn't very large (and he doesn't say how he measured it, either). – James Kanze Jun 16 '11 at 14:05
  • 1
    Are you using the same input data in both tests? – Jay Jun 16 '11 at 14:16
  • @Konrad Rudolph my test prog: https://ideone.com/hXv9V (ideone runs them in exactly the same time!). Also, recompiling with gcc 4.6.0 or 4.6.1 makes them run in the same time on my system (both 640 - 650 ms). Must be something with my 4.5.2 setup. – Cubbi Jun 16 '11 at 15:02
  • See: http://stackoverflow.com/q/3664272/14065 – Martin York Jun 16 '11 at 16:40
  • Okay, I have used the `-O3` switch, but the code was not compiled with `-march=native` The data were identical in all 10 measurements, the standard deviation is in orders of miliseconds (for both tests). I know that to be statistically correct, more measurements are required but the results are pretty stable even for this number of measurements. – Radim Vansa Jun 16 '11 at 18:27

3 Answers3

12

std::vector<std::pair<float, unsigned int>> (filled with push_back operation)

This stores all the data continguously, so memory locality is very good

std::pair<float, unsigned int>> * array (allocated using new and then filled one by one)

This scatters the data all over memory.

You've set up a very unfair comparison between vector and a simple array. The extra indirection involved in the array case is going to hurt, the lack of locality is going to kill cache performance. I'm surprised you don't see a bigger win in favor of contiguous storage.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • 6
    I don't think that this is right. If I understand correctly, the second allocates an array of `std::pair>`, using array new. Locality should be exactly the same. – James Kanze Jun 16 '11 at 14:03
  • @James: That's not what the question literally says, although I understand how you could interpret it that way (and your interpretation may very well be correct). The question says he has an array of pointers to pairs. This is why questions without code, and performance questions especially, are a terrible idea. – Ben Voigt Jun 16 '11 at 14:06
  • @Ben I understood it that he called `std::sort` on 1) an `std::vector >` and 2) a `std::pair*`, but it's far from clear. (But I'd expect a far greater difference if he really did use an array of pointers in the second case.) – James Kanze Jun 16 '11 at 14:28
  • @James: If the allocations were done with good time locality, space locality would be mediocre and not outright terrible. That's the best explanation I can offer. – Ben Voigt Jun 16 '11 at 14:34
  • The recursive nature of quicksort along with the cache will make locality less important than you may think. Once you recurse down to some level, all the data is in cache and only indirection is causing a penalty, not locality. I'm assuming quicksort here. – phkahler Jun 16 '11 at 15:28
  • @phkahler: Even quicksort has to access some of the same items repeatedly. If items are individually allocated, often adding a heap header to each, fewer items fit in cache. Anyway, I don't think gcc `std::sort` is quicksort. – Ben Voigt Jun 16 '11 at 15:54
  • @Ben Voigt It's a variant on quick sort, which shifts to a heap sort if it finds it's recursing too deeply, and uses insertion sort when the number of elements in the partition becomes small enough. – James Kanze Jun 16 '11 at 17:10
  • @Ben: Sorry, I have doubled the `*` and array in the description, James understood me correctly. It was just an array of pairs, there was no another level of indirection, so the locality is really the same. – Radim Vansa Jun 16 '11 at 18:23
2

They will use the same version of sort. It's quite likely random CPU effects, like caching or thread context switches.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • How do you know they use the same version of sort? As I recall it is not specified how they are implemented in the ISO specification. They probably just use the same interface/template. – Yet Another Geek Jun 16 '11 at 13:40
  • 1
    @Yet - If he is using `gcc (GCC) 4.4.5 20110214` we know what std::sort looks like. – Bo Persson Jun 16 '11 at 13:45
  • @Yet Another Geek: Because vector iterators and array iterators are the same thing in release builds. – Puppy Jun 16 '11 at 17:16
  • @DeadMG I thought array iterated using pointers, and vector through some kind of typesafe class based reference – Yet Another Geek Jun 16 '11 at 17:23
  • @Yet Another Geek: In release builds, `vector::iterator` can be typedefed as `T*`. Their specification is identical. – Puppy Jun 16 '11 at 19:19
1

Did you use -O3 to compile your code?

If not, do it. All other benchmark results are meaningless, especially for template code.

Did you run the test many times?

This is done to prevent things like interrupts and or caching to have to much of an impact on your result.

Don't use floatint point comparison or arithmetic for benchmarks. The results depend heavily on the compiler, platform, compiler options etc.

How was your testdata created?

The time required by most sorting algorithm changes depending on the sorting of the input data.

Which method for measuring time did you use? Clock cycles? A timer?

Anyway, writing benchmarks that provide reliable results is not as easy as it might seem at first. Don't use a benchmark to determine what the proper code for your problem is.

pmr
  • 58,701
  • 10
  • 113
  • 156