std::vector-like class optimized to hold a small number of items

Question

In one time-critical part of the program there is a member of the class that looks like that: std::vector m_vLinks; During profiling I noticed that about 99.98% of executions this vector holds only 0 or 1 items. However in very rarely cases it might hold more. This vector is definitely a bottleneck according to profiler, so I'm thinking about following optimization:

Craft a hand-made class with vector-like interface
This class will hold true size, one item and optional pointer to the vector
In this case when vector holds 1 item there won't be any dynamic memory allocations, and also accessing this item will be (a bit) faster due to removing of one indirection.
When we need to hold more data vector is dynamically allocated
Of course this vector won't provide one memory block holding all items (not needed here), and also some operations will be more complex

Before starting to prototype this thing to see if it helps, I wonder if anyone encountered custom containers with similar functionality in some 3rd-party libraries?

I already thought about boost::array, but don't want size limit that it imposes

Which operations exactly take most of the time in your scenario? — sharptooth, Feb 21 '12 at 14:27
is vector a bottleneck because you frequently create new ones? In that case I doubt your optimization will help much... — Karoly Horvath, Feb 21 '12 at 14:28
Allocation and freeing vector's dynamic buffer (I use object pool to hold objects that have this member, but anyway need to periodically clean them by swapping vector of Links with empty vector, although this is not frequent operation). — Alex Z, Feb 21 '12 at 14:29
http://llvm.org/docs/doxygen/html/classllvm_1_1SmallVector.html — Benjamin Lindley, Feb 21 '12 at 14:30
did you try any compiler commands? there should be one for vectors that may help just sure the name — pyCthon, Feb 21 '12 at 14:35
You could use a custom allocator to achieve this directly with `std::vector`. Check out Howard Hinnants `stack_alloc`: http://home.roadrunner.com/~hinnant/stack_alloc.html — Darren Engwirda, Feb 21 '12 at 14:45
So to be precise, the class usually holds 0 or 1 object, and should be optimized for that, but it *may* hold an unbounded number of objects? — jalf, Feb 21 '12 at 15:02
@Fahrenheit2539: "This vector is definitely a bottleneck according to profiler" - doing *what* with the vector is definitely a bottleneck? If it's anything other than, "increasing its size from 0 to 1" or "copying it when its size is 1", then I don't see why your change would make a difference. — Steve Jessop, Feb 21 '12 at 15:12

score 9 · Accepted Answer · answered Feb 21 '12 at 14:45

9

LLVM has a class for that called SmallVector.

answered Feb 21 '12 at 14:45

Benjamin Lindley

101,917
9
204
274

2

Of course, the one question: is it faster, really ? There is quite a heavy-weight hierarchy and I do wonder whether it can manage the speed of `vector` or not. – Matthieu M. Feb 21 '12 at 14:58
3

In general, hierarchies don't make code slow. Compilers see right through that. In particular, if you see stuff like `is_pod` in a hierarchy, you know that's compile-time computing. – MSalters Feb 21 '12 at 15:01
@MSalters: Sorry if it didn't sound right. What I meant is that there is way too much code for me to eyeball whether or not it will be faster than `vector`. – Matthieu M. Feb 21 '12 at 15:07
2

@MatthieuM. -- The following program runs in about 4.5 seconds for me: http://pastebin.com/Vmr47x0b -- When I switch it to use `llvm::SmallVector`, it goes to about 1.5 seconds. – Benjamin Lindley Feb 21 '12 at 15:24

score 8 · Answer 2 · answered Feb 21 '12 at 15:00

8

In a non-time-critical part of your code, perform: m_vLinks.reserve(1);. That way, in the time-critical part, there will typically be no dynamic allocation.

answered Feb 21 '12 at 15:00

Robᵩ

163,533
20
239
308

1

I like your proposition. Simple. Keep it simple. (+1) – yves Baumes Feb 21 '12 at 15:28
http://stackoverflow.com/a/36371057/1599699 – Andrew Feb 14 '17 at 07:13

score 5 · Answer 3 · answered Feb 21 '12 at 15:05

My first attempt would be to optimize the memory allocator. Naive malloc implementations are not too efficient, you may want to try tcmalloc or jemalloc.

My second attempt would be to change the allocator. Howard Hinnant has demonstrated how to use a stateful allocator that has some memory preallocated on the stack. This is only Standard compliant in C++11, but may already be supported.

My third attempt would be to change the code and pre-allocate the memory if possible. Instead of building the vector anew each time, you could keep it around: its capacity will not diminish and so subsequent uses won't allocate memory.

There are few chances that a homebrew implementation would match the speed of the std::vector<T> classes, as many of its methods have been tuned for maximum performance.

score 0 · Answer 4 · answered Feb 21 '12 at 14:58

0

I generally use std::list for these cases. The O(N) operations in std::list don't hurt me when N==1.

answered Feb 21 '12 at 14:58

MSalters

173,980
10
155
350

3

I suppose there is at least 1 dynamic memory allocation when item is added into the list (even if it is the first item). I needed the solution that could hold 1 item without any dynamic allocations – Alex Z Feb 21 '12 at 15:01
True, but that dynamic allocation is always the same size (single node), unlike std::vector, and the node pointer is always dereferenced. That's actually not as bad as you'd think, on modern hardware and with modern allocators. – MSalters Feb 21 '12 at 15:06
The dynamic allocation is solved by the pool he's using. However, that leads to poor locality. – phkahler Feb 21 '12 at 15:08

std::vector-like class optimized to hold a small number of items

4 Answers4

Linked