3

I wanted to write the Digital Search Tree in C++ using templates. To do that given a type T and data of type T I have to iterate over bits of this data. Doing this on integers is easy, one can just shift the number to the right an appropriate number of positions and "&" the number with 1, like it was described for example here How to get nth bit values . The problem starts when one tries to do get i'th bit from the templated data. I wrote something like this

#include <iostream>

template<typename T>
bool getIthBit (T data, unsigned int bit) {
    return ((*(((char*)&data)+(bit>>3)))>>(bit&7))&1;
}

int main() {
    uint32_t a = 16;
    for (int i = 0; i < 32; i++) {
        std::cout << getIthBit (a, i);
    }
    std::cout << std::endl;
}

Which works, but I am not exactly sure if it is not undefined behavior. The problem with this is that to iterate over all bits of the data, one has to know how many of them are, which is hard for struct data types because of padding. For example here

#include <iostream>

struct s {
    uint32_t i;
    char c;
};

int main() {
    std::cout << sizeof (s) << std::endl; 
}

The actual data has 5 bytes, but the output of the program says it has 8. I don't know how to get the actual size of the data, or if it is at all possible. A question about this was asked here How to check the size of struct w/o padding? , but the answers are just "don't".

  • 8 is the actual size of your `struct s` data. Some of those are just undefined, so may be either always zero or sometimes random. – CiaPan Sep 09 '21 at 14:59
  • @CiaPan then how to iterate over the defined bits of data – Don'tDownVote Sep 09 '21 at 15:01
  • You could enforce a byte-level padding, but 1) this is not always possible (some hadware solutions do not let you read or write double-byte data if it is not aligned at a word-size boundary) and 2) even if it was it doesn't help for bit fields (you can declare a one-byte structure that actually uses just 3 bits of the byte). – CiaPan Sep 09 '21 at 15:02
  • You can use struct packing, but the behavior is going to be implementation-defined. There's also the issue of byte order. The code isn't going to be portable across different platforms, but will work on e.g. x86. – rustyx Sep 09 '21 at 15:11
  • @rustyx: relying on the binary representation is a hack and *a priori* non-portable. –  Sep 09 '21 at 15:33
  • Blindly using a DST on general data types, such as structs or just floats, is maybe not the best idea. The distribution of the bits can be quite skewed, leading to degenerate trees with a large height. –  Sep 09 '21 at 15:35
  • @YvesDaoust DST works quite well for arbitrary distribution of keys because the maximum height of the tree is the number of bits of the data type (for example 32 for floats) – Don'tDownVote Sep 09 '21 at 15:39

2 Answers2

3

It's easy to know know how many bits there are in a type. There's exactly CHAR_BIT * sizeof(T). sizeof(T) is the actual size of the type in bytes. But indeed, there isn't a general way within standard C++ to know which of those bits - that are part of the type - are padding.

I recommend not attempting to support types that have padding as keys of your DST.


Following trick might work for finding padding bits of trivially copyable classes:

  • Use std::memset to set all bits of the object to 0.
  • For each sub object with no sub objects of their own, set all bits to 1 using std::memset.
  • For each sub object with their own sub objects, perform the previous and this step recursively.
  • Check which bits stayed 0.

I'm not sure if there are any technical guarantees that the padding actually stays 0, so whether this works may be unspecified. Furthermore, there can be non-classes that have padding, and the described trick won't detect those. long double is typical example; I don't know if there are others. This probably won't detect unused bits of integers that underlie bitfields.

So, there are a lot of caveats, but it should work in your example case:

s sobj;
std::memset(&sobj, 0, sizeof sobj);
std::memset(&sobj.i, -1, sizeof sobj.i);
std::memset(&sobj.c, -1, sizeof sobj.c);

std::cout << "non-padding bits:\n";
unsigned long long ull;
std::memcpy(&ull, &sobj, sizeof sobj);
std::cout << std::bitset<sizeof sobj * CHAR_BIT>(ull) << std::endl;
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Though there aren't standard utilities, for aggregate types we _are_ able to statically call a function on each member individually with C++17 by abusing structured-binding expressions. This allows us to then just access each member and just disregard the padding. At which point a general solution that iterates bits on a single T type can be used on each member – Human-Compiler Sep 09 '21 at 15:26
  • How about `memset`-ing bit fields...? – CiaPan Sep 09 '21 at 15:48
  • @CiaPan Memset works on bytes. You could instead just assign each bitfield though since they're just integers. – eerorika Sep 09 '21 at 15:49
  • But this approach is not general. The data structure is supposed to just get type T and work regardless of the fields of struct T – Don'tDownVote Sep 09 '21 at 15:52
  • @Don'tDownVote As I said: **There isn't a general way** – eerorika Sep 09 '21 at 15:53
  • ...which requires the counting routine to know exactly what are the components of the structure of question, thus disabling any automatic way to count the 'information' bits (as opposed to 'void' padding bits). in it. – CiaPan Sep 09 '21 at 15:53
2

There's a Standard way to know if a type has unique representation or not. It is std::has_unique_object_representations, available since C++17.

So if an object has unique representations, it is safe to assume that every bit is significant.

There's no standard way to know if non-unique representation caused by padding bytes/bits like in struct { long long a; char b; }, or by equivalent representations¹. And no standard way to know padding bits/bytes offsets.

Note that "actual size" concept may be misleading, as padding can be in the middle, like in struct { char a; long long b; }

Internally compiler has to distinguish padding bits from value bits to implement C++20 atomic<T>::compare_exchange_*. MSVC does this by zeroing padding bits with __builtin_zero_non_value_bits. Other compiler may use other name, another approach, or not expose atomic<T>::compare_exchange_* internals to this level.


¹ like multiple NaN floating point values

Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79