1

I'm trying to create a customized replacement for std::vector where I know that the vast majority of the time, the size will be <= 3. On rare occasions, I could have up to 32 integers. This is a performance and memory critical vector and needs to be copied frequently, so I want to avoid the allocations for the 99.9% case.

I am using 64-bit compilers on Linux and Windows (gcc & msvc) with c++17. int is known to be 32-bit. If we ever need to support something different, I'll revisit the code.

I'd essentially like this memory structure:

class SmallIntVector
{
    int m_size;
    union
    {
        int m_data[3]; // Used when m_size <=3
        int* m_dataDynamic; // Used when m_size >=4
    };
};

However, the size of this is 24 due to the alignment. I can do this:

class SmallIntVector
{
    int m_size;
    int m_data0; // first data member for m_size <= 3.  Return &m_data0 and treat it as data[3].
    union
    {
        int m_data12[2]; // next two data members.
        int* m_dataDynamic; // Used with m_size>=4
    };
};

But, I'm pretty sure this is UB because of aliasing memory. I could also do this:

struct ForStatic
{
    int m_size;
    int m_data[3];
};

struct ForDynamic
{
    int m_unused; // Don't use this to avoid punning.
    int *m_dataDynamic; // Pointer, must be at an 8 byte offset
};
class SmallIntVector
{
    union
    {
        ForStatic m_stat; // When m_stat.m_size is <=3, use m_stat.m_data.
        ForDynamic m_dyn; // When m_stat.m_size is >=4, use m_dyn.m_data.
    };
};

The size is 16, but I think this is illegal punning of the union: I'd be using m_size from ForStatic while at the same time using m_dataDynamic from ForDynamic.

I can easily avoid UB by using std::memcpy to read the dynamic pointer, but none of this feels right. Is there any way to get the alignment I'm looking for and avoid UB?

I've omitted all of the member functions -- I think they are obvious once the structure is right.

Rob L
  • 2,351
  • 13
  • 23
  • 2
    Personally I would create my own custom allocator and use that with a `std::vector`. The allocator can have space in it (as a class member) for 3 integers in it and use that until the fourth is requested, then it just grabs a chunk for 32 integers so that it wont have to allocate again. – NathanOliver Jun 28 '21 at 19:14
  • Is this simply a case of using `#pragma pack` and similar? https://stackoverflow.com/questions/45116212/are-packed-structs-portable – Den-Jason Jun 28 '21 at 19:19
  • I don't see anything wrong about `SmallIntVector` although I would have thought that `m_unused` in `ForDynamic` should be `m_size`? Overall that would be neater than splattering pack pragmas. You may want to specify an align to ensure the whole shebang is aligned properly. – Den-Jason Jun 28 '21 at 19:22
  • Also see `max_align_t` which would enforce alignment without needing an align pragma: https://stackoverflow.com/questions/33018305/default-union-and-structure-alignment-in-c .... so shove that into your union – Den-Jason Jun 28 '21 at 19:35
  • @NathanOliver I had considered using a custom allocator with std::vector. However, I think that using a std::vector would have considerably more memory overhead and would mostly defeat the purpose. sizeof(std::vector) is typically 24. I'd love to be wrong, though. – Rob L Jun 28 '21 at 19:36
  • @Den-Jason I have to check the size in one of the unions before I know if it is dynamic -- that's how I find out it is dynamic. So, to avoid even more punning, I always would write m_size into ForStatic. The trick here is that the int* needs 8 byte alignment, while the data needs 4 byte alignment. If I change the packing and move the pointer to a 4 byte offset, I'm asking for trouble. I'll edit the last code sample for clarity. – Rob L Jun 28 '21 at 19:38
  • You can have `int m_size` as a separate union member. Plus a `max_align_t dummy`. – Den-Jason Jun 28 '21 at 19:39
  • @RobL Not sure if you can do it better then needing at least 24 bits of space. I would try the vector and custom allocator route and profile. If it isn't up to performance specs, then look at replacing the vector with something else. You may even find that it isn't a container issue, but something else that's eating all of your perofmance. – NathanOliver Jun 28 '21 at 19:45
  • 1
    Have you looked at [Boost's `small_vector`](https://www.boost.org/doc/libs/1_76_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.small_vector) – JDługosz Jun 28 '21 at 19:47
  • Storing UTF-8 characters has the same issue, variable length. The trick is that you store 8-bits sequence, the last bit just tells to continue or not in the next byte. For accessing a particular element you must traverse all bytes before, or store an index, or at least a "last accessed pos" cache. – Ripi2 Jun 28 '21 at 19:48
  • @NathanOliver I agree, don't fix it if not necessary. This is part of a commercial CAD tool, and it has been profiled. Copying a small structure which has this vector was taking about 10% of the entire 55 second run (solving very complex mathematics -- this bookkeeping shouldn't be on the radar). Using custom code solves the performance problem for me, but I'm being greedy and trying to reduce memory, too. :) There can be many millions of these things. – Rob L Jun 28 '21 at 19:49
  • 1
    @JDługosz Thank you, I was unaware of small_vector. I'm guessing this is a great fit. However, sizeof(boost::container::small_vector) is 32. This is partly because boost uses size_t for the size, where in my case I can use int, but that doesn't completely account for the difference. – Rob L Jun 28 '21 at 19:56
  • 1
    Using `memcpy` actually gets optimized out to the regular access, so just use a private member function for read/write access to it and don't worry about it. Being in one place, you can easily update with compiler-specific stuff or `bit_cast` if available, and make sure it really does generate optimal code. – JDługosz Jun 28 '21 at 19:58
  • 1
    @Rob, you can, at the very least, see how Boost does the variant and efficient packing. – JDługosz Jun 28 '21 at 20:00
  • @JDługosz I'd be much more confident if I was using c++20 and had bit_cast available. (Our project is many millions of lines, so switching to c++20 will have to wait until a maintenance opportunity.) Historically, I've simply seen memcpy not get optimized away as much as it should, but that's probably a dated observation. I may end up just having to do it that way, though. – Rob L Jun 28 '21 at 20:02
  • 1
    @RobL right, if you make `get_size()` a private member, it can contain the `memcpy` and you can make sure it optimizes, or contains whatever clues or nuances needed to make it optimize. You can later change it to your own `bit_cast` implementation, and later still change it to `std::bit_cast` or compiler intrinsic. Basically, advantages of it only being coded _once_. – JDługosz Jun 28 '21 at 20:27
  • use `union SmallIntVector { int m_data[4]; int* m_dataDynamic; };` and `m_data[3]` use as `m_size`. this take 16 bytes exactly on both 32/64 bit system – RbMm Jun 28 '21 at 21:01
  • @RbMm This is essentially the same solution and also has the union punning problem -- writing a pointer to m_dataDynamic followed by size read from m_data. – Rob L Jun 28 '21 at 21:07
  • @RobL - this is good solution and 100% ok. 16 bytes only and no any problems – RbMm Jun 28 '21 at 21:22
  • 1
    @RobL also see this question - https://stackoverflow.com/questions/18530512/stl-boost-equivalent-of-llvm-smallvector/30468719 – Den-Jason Jun 29 '21 at 21:38

1 Answers1

0

I would suggest the following:

struct ForStatic
{
    int m_size;               //this must be first - overlapped in union
    int m_data[3];
};

struct ForDynamic
{
    int m_size;                //this must be first - overlapped in union
    int *m_dataDynamic;
};

class SmallIntVector
{
    union
    {
        max_align_t dummy;   // ensures alignment
        int m_size;
        ForStatic m_stat; // When m_size is <=3, use m_stat.m_data.
        ForDynamic m_dyn; // When m_size is >=4, use m_dyn.m_data.
    };
};
Den-Jason
  • 2,395
  • 1
  • 22
  • 17
  • Shouldn't `m_size` not be in the `union`? – NathanOliver Jun 28 '21 at 19:42
  • 1
    `m_size`, `m_stat.m_size`, `m_dyn.m_size` would all occupy the same space. It's semantically easier to follow. – Den-Jason Jun 28 '21 at 19:44
  • @Den-Jason Thank you for the reply. This does perhaps make things clearer, but I think it has the same potential problem. I believe that using SmallIntVector::m_size while simultaneously using SmallIntVector::m_dyn.m_dataDynamic is illegal because it puns the union. You can't legally write to one member and read back the same value as a different type unless you use memcpy. I don't know if this is in violation of that rule. – Rob L Jun 28 '21 at 19:46
  • @Den-Jason While that may be true, using `union::m_size` would be undefined behavior if you are using one of the other members already – NathanOliver Jun 28 '21 at 19:46
  • @NathanOliver would std::variant be a better solution? Looking at https://stackoverflow.com/questions/42082328/where-to-use-stdvariant-over-union and https://stackoverflow.com/questions/25664848/unions-and-type-punning – Den-Jason Jun 28 '21 at 19:51
  • @RobL this might be worth a look: https://www.bfilipek.com/2018/06/variant.html – Den-Jason Jun 28 '21 at 19:55
  • @RobL: The phrase "is illegal" contradicts the Standard's statement that "Although this document states only requirements on C++ implementations, those requirements are often easier to understand if they are phrased as requirements on programs, parts of programs, or execution of programs." Implementations are allowed to specify how they will behave in cases where the Standard imposes no requirements, and tasks that require doing things for which the Standard makes no provision will only be possible on implementations that do precisely that. – supercat Jul 14 '21 at 21:06