5

Short story: I need a container that can store an unlimited number of elements, but does no dynamic memory allocation for a small number of elements.

Long story: I need to store a couple of integers, the number of which is not bounded, but rarely more than two or three. I tried using a vector<int> or list<int>, but the performance as compared to a simple array is worse by a factor of at least 50 in release mode, and 500 in debug mode. For example, I measured creating a vector, calling push_back three times, and destroying it again. This takes around 250ns. Calling reserve(3) upfront helps a bit, the time goes down to 100ns. Using an array instead takes like 3ns, but I cannot use a fixed-size array because the number of elements is unbounded. I need to store a lot of those things, as part of larger objects.

So I guess what I'm asking for is a container that uses some sort of small vector optimization: As long as the number of elements is small they are stored within the object itself, and only when it grows beyond a certain limit it allocates memory on the heap. Does such a thing exist?

As a side node, basic_string is known to have this sort of optimization, so I tried basic_string<int>. Indeed, for a size of 3 it's four times faster than vector<int>, but somehow using a string class for storing integers doesn't feel right. So again, my question: Is there some container that performs well for small size, but if needed can grow unlimited?

EDIT People asked for the actual code. Well, there is nothing special about it, it's straighforward. I cannot post it because it's part of a large benchmarking environment, with special purpose classes for timing, and some classes for obfuscating the code to prevent the compiler from replacing it with {}. You know, it does that when it notices the result isn't used. The core of it comes down to this:

vector<int> v;
v.push_back(17);
v.push_back(19);
v.push_back(23);

and

int x[3];
int k = 0;
x[k++] = 17;
x[k++] = 19;
x[k++] = 23;

each of which is run in a tight timing loop.

EDIT2 The use case for this is a tree with a variable number of branches at each node. So I actually need to store pointers and no integers, but this is on 32-bit so both are the same size. Performance of this is important, because the tree is constantly modified.

Edit3 People keep doubting my numbers. Why do you think they are unreasonable? A vector does dynamic memory allocation, the array doesn't, and this seems to be a good explanation for the difference, or isn't it?

pentadecagon
  • 4,717
  • 2
  • 18
  • 26
  • i'd allocate your average amount into a vector with `reserve` – Daniel A. White Mar 22 '14 at 12:45
  • 4
    Can you show code for the benchmark? – Vaughn Cato Mar 22 '14 at 12:45
  • 3
    Something sounds fishy with your testing methodology. – Joe Mar 22 '14 at 12:45
  • @DanielA.White OP says that he tried that already. – Sergey Kalinichenko Mar 22 '14 at 12:46
  • You can assume `std::vector` is pretty well optimized already IMHO. – πάντα ῥεῖ Mar 22 '14 at 12:46
  • 6
    You cannot measure at 250 ns accuracy, let alone 3 ns. The numbers you’ve measured are completely random. – Konrad Rudolph Mar 22 '14 at 12:47
  • 1
    you should detail your use case, 100ns is likely to be a totally negligible overhead for most applications. Besides, I'm curious: how could you benchmark with nanosecond precision? Real-time kernel? And did you take average and standard deviation over a significant number of runs? Was it a microbenchmark? Code profiling? Otherwise those figures are garbage... – Stefano Sanfilippo Mar 22 '14 at 12:47
  • @Joe What do you mean, do you think the numbers are not reasonable? Which number in particular? – pentadecagon Mar 22 '14 at 12:48
  • "using a string class for storing integers doesn't feel right." What's wrong with it, apart from the name? There's a perfect solution for it, too: `typedef basic_string small_container_for_ints;` – Sergey Kalinichenko Mar 22 '14 at 12:48
  • 1
    @Konrad Of course I did not measure running it once, but many times in a loop. The loop overhead is just two machine instructions and less than 1ns. – pentadecagon Mar 22 '14 at 12:51
  • 1
    Synthetic benchmarks aside, is this making any meaningful difference in actual production code? – user2802841 Mar 22 '14 at 13:01
  • 1
    @pentadecagon I’m not convinced. From experience, well over 95% of all microbenchmarks get it laughably wrong. And that’s not surprising, since microbenchmarks are *very easy* to get wrong. There are whole academic papers written about this subject. I wouldn’t trust any homebrew microbenchmark code. If you used a library, that’s something else of course, but it’s sufficiently rare here on SO that it’s worth mentioning in the question. But just running the benchmark in a code is most assuredly **not** enough. – Konrad Rudolph Mar 22 '14 at 13:09
  • 1
    @πάνταῥεῖ Sure. However, vector does dynamic memory allocation even for a very small size and that takes performance. – pentadecagon Mar 22 '14 at 13:15
  • @pentadecagon and if you do `int* arr = new int[3];` its also dynamic memory allocation... – Sebastian Hoffmann Mar 22 '14 at 13:18
  • @VaughnCato Adding code is difficult, I added some text as to why. And why do you care? Do you think the numbers are not plausible? – pentadecagon Mar 22 '14 at 13:18
  • @pentadecagon: It would really help eliminate the debate over the quality if your test. – Vaughn Cato Mar 22 '14 at 13:20
  • @StefanoSanfilippo I added some text. I need this for storing a tree. – pentadecagon Mar 22 '14 at 13:22
  • @pentadecagon: A shot in the dark here, are you using VS2008 or less to compile? There is an issue where you may need to define _SECURE_SCL=0 in release mode to get better performance with STL. – Vince Mar 22 '14 at 13:25
  • @Konrad Do you think the numbers I posted are unreasonable? They do make sense to me: A `vector` need one or more dynamic memory allocations, and that eats time. An array instead needs exactly no time for memory allocation. – pentadecagon Mar 22 '14 at 13:25
  • @Vince Thanks for the idea, but this isn't it. I won't tell which compiler I'm using, because I think this issue is platform-independent. – pentadecagon Mar 22 '14 at 13:36
  • Here is a program that demonstrates what he's talking about: https://ideone.com/cs4Dcm You'll need to split it into the three files. Having it all in one file makes it easier for the compiler to optimize away the array operations. – Vaughn Cato Mar 22 '14 at 13:46
  • The version using std::basic_string is actually much slower on my system. – Vaughn Cato Mar 22 '14 at 13:48
  • Is the performance requirement for 2 or 3 elements the same as a gazillion elements? If not, use a fixed size array to cover 99% of the case and then string for the rest. – ThomasMcLeod Mar 22 '14 at 13:48
  • 1
    @pentadecagon Sorry, I got carried away over this. No, I think your numbers are entirely reasonable. Well, maybe I’m slightly wary of the huge fold change between the values but the trend is probably right. I also don’t think the question is a priori unreasonable – I’ve had similar problems situations in the past. – Konrad Rudolph Mar 22 '14 at 13:51
  • @ThomasMcLeod That's exactly what I would like to do, but it's quite a lot of work with a number of possible pitfalls so I wanted to know if maybe such a thing exists already. – pentadecagon Mar 22 '14 at 13:58
  • @VaughnCato Thanks for your effort. Your string performance is interesting, but not really important, because I didn't consider this as a viable option anyway. For the record, I used VS2013 for this. – pentadecagon Mar 22 '14 at 14:02
  • This appears to be a dup: http://stackoverflow.com/q/9379204/951890 – Vaughn Cato Mar 22 '14 at 14:03
  • @VaughnCato Indeed, it looks like a duplicate. And I like the solution given there, I'll benchmark it and report back if it solves my problem. – pentadecagon Mar 22 '14 at 14:30
  • The `Container` library of Boost v1.58 (april 2015) has the experimental [`small_vector`](http://www.boost.org/doc/libs/1_58_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.small_vector) container. Also take a look at http://stackoverflow.com/q/18530512/3235496 – manlio May 30 '15 at 14:59

0 Answers0