2

I am using std::bitset to represent short DNA sequences (haplotypes). For my purposes, I need to be able to convert each such sequence into a single integer. At the moment, it seems like this requirement bounds my sequences to be of length <=64 because of the way std::bitset::to_ullong works?

Since there are ways to represent 128-bit integers in C++ (Is there a 128 bit integer in C++?), I am wondering how long it will take before std::bitset supports direct conversion to 128-bit integers as well?

Of course I would be more than happy to learn that I am wrong and one can already do that...

Thanks!

gvbarroso
  • 21
  • 4
  • C++ doesn't have a guaranteed 128 bit type. That said, there are plenty of libraries/extensions that do, and a lot of them allow construction from a string which `bitset` can provide. – NathanOliver Nov 01 '21 at 16:22
  • 4
    Not sooner than C++26. – Quimby Nov 01 '21 at 16:22
  • As long C++ doesn't support 128 bit integer, there's no reason `bitset` should support a direct conversion for it – Ranoiaetep Nov 01 '21 at 16:23
  • @Quimby is there concrete reason to believe it may come in C++26? – gvbarroso Nov 01 '21 at 16:28
  • 2
    @user3597591 No, not at all. I do not think there is any almost-done proposal for 128bit types. But it cannot come sooner as C++23 has been fixed already AFAIK. – Quimby Nov 01 '21 at 16:30
  • 1
    Not reliable, but GCC seems to be optimizing the manual operation as well as one could hope: https://gcc.godbolt.org/z/rj3PYTx5h –  Nov 01 '21 at 16:30

1 Answers1

5

You could always provide your own:

#include <climits>
#include <cstdint>
#include <limits>
#include <bitset>

static_assert(sizeof(unsigned long long) == 8 && CHAR_BIT == 8);

struct my_uint128_t {
    std::uint64_t lo;
    std::uint64_t hi;
};

template<std::size_t N>
my_uint128_t to_uint128_t(const std::bitset<N>& v) {
  constexpr std::bitset<N> mask(std::numeric_limits<std::uint64_t>::max()); 
  std::uint64_t lo = (v & mask).to_ullong();
  std::uint64_t hi = ((v >> 64) & mask).to_ullong();

  return {lo, hi};
}

However, this makes a few technically unsafe assumptions, such as unsigned long long being 64 bits. The static_assert helps, but a better solution is to actually account for the possible discrepancies:

#include <climits>
#include <limits>
#include <bitset>
#include <array>

template<std::size_t BITS>
struct my_uint_t {
    static constexpr std::size_t num_parts = 
        (BITS / CHAR_BIT + (BITS % CHAR_BIT > 0)) / sizeof(unsigned long long);
    std::array<unsigned long long, num_parts> parts;

    template<std::size_t N, std::size_t I>
    void populate_from_bitset(const std::bitset<N>& v) {
        constexpr std::size_t offset = I * CHAR_BIT * sizeof(unsigned long long);
        constexpr std::bitset<N> mask(std::numeric_limits<unsigned long long>::max());

        parts[I] = ((v >> offset) & mask).to_ullong();
        if constexpr(I + 1 < num_parts) {
            populate_from_bitset<N, I + 1>(v);
        }
    }
};

template<std::size_t N>
my_uint_t<128> to_uint128_t(const std::bitset<N>& v) {
    
  my_uint_t<128> result;
  result.populate_from_bitset<N, 0>(v);

  return result;
}

auto tmp_a = &to_uint128_t<16>;
auto tmp_b = &to_uint128_t<65>;
auto tmp_c = &to_uint128_t<128>;

Both gcc and clang optimize this down to adequately clean assembly on X86-64:

my_uint128_t to_uint128_t<16ul>(std::bitset<16ul> const&):
        movzx   edx, WORD PTR [rdi]
        xor     eax, eax
        ret
my_uint128_t to_uint128_t<65ul>(std::bitset<65ul> const&):
        mov     rdx, QWORD PTR [rdi]
        mov     rax, QWORD PTR [rdi+8]
        ret
my_uint128_t to_uint128_t<128ul>(std::bitset<128ul> const&):
        mov     rdx, QWORD PTR [rdi]
        mov     rax, QWORD PTR [rdi+8]
        ret

see on godbolt

Notes:

  • This is reliant on the standard library being implemented in a certain way. While it'll be guaranteed to work, there is no formal guarantee that performance will be consistent.
  • my_uint128_t is a bit of a placeholder type here. There's still some endianness handling required to use this correctly as a portable 128 bit integer.
  • Why yes, correctly dealing with non-typical integer types leads to a huge mess.
  • As an addendum, the performance consistency concern is a bit theoretical, I can't fathom anyone implementing `std::bitset` in a way that would break this. –  Nov 01 '21 at 16:53