You can do better with the very fortunate
#define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS
which I noticed first because it's used in other third-party libraries (I forget the name, but it was AI/ML related).
I had a version here that wasn't very generic because it used size-specific bit twiddling hacks (to reverse e.g. byte or uint32). You might be interested in those:
You can still see the uint32-dedicated version Live On Compiler Explorer.
Generic Version
Since then I spotted this nice answer: In C/C++ what's the simplest way to reverse the order of bits in a byte? and it gave a reasonably efficient in-place reverse algorithm for power-of-2 width integral types. So, now we have the completely generic:
// make sure it's globally defined
#define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS
#include <boost/dynamic_bitset.hpp>
#include <iostream>
template <typename Block, typename Allocator>
void reverse(boost::dynamic_bitset<Block, Allocator>& bs) {
auto constexpr BLOCK_BIT = sizeof(Block) * CHAR_BIT;
auto original_size = bs.size();
if (auto partial_block = bs.size() % BLOCK_BIT) {
auto pad = (BLOCK_BIT - partial_block);
bs.resize(bs.size() + pad);
bs <<= pad;
}
// see https://stackoverflow.com/a/61109975/85371
auto inplace = [](Block& n) {
static_assert(std::is_unsigned_v<Block>);
short bits = sizeof(n) * 8;
Block mask = ~Block(0); // equivalent to uint32_t mask =
// 0b11111111111111111111111111111111;
while (bits >>= 1) {
mask ^= mask << (bits); // will convert mask to
// 0b00000000000000001111111111111111;
n = (n & ~mask) >> bits | (n & mask) << bits; // divide and conquer
}
};
for (auto& b : bs.m_bits) {
inplace(b);
}
std::reverse(begin(bs.m_bits), end(bs.m_bits));
bs.resize(original_size);
}
NOTE the reversal will MUCH more efficient for size()
multiples of BLOCK_BIT
. This might in some scenario even lead one to prefer Block = std::uint8_t
.
Generic Tester/Benchmark
Let's write some randomized tests. For ease of implementation we compare the reversed string representation to the string representation of the reverse.
I added tests for varying block sizes and statistics on the sizes and times:
Live On Compiler Explorer
// For quick testing
#include <random>
#include <chrono>
#include <boost/range/algorithm/reverse.hpp>
#include <boost/lexical_cast.hpp>
static auto now = std::chrono::high_resolution_clock::now;
using namespace std::chrono_literals;
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>
namespace ba = boost::accumulators;
namespace bat = ba::tag;
template <typename Block> bool run_test(unsigned n, auto& stats) {
using BitSet = boost::dynamic_bitset<Block>;
auto gen = std::bind(std::uniform_int_distribution<Block>{},
std::mt19937{std::random_device{}()});
while (n--) {
Block init[]{gen(), gen(), gen()};
auto sz = std::bind(
std::uniform_int_distribution(0ul, sizeof(init) * CHAR_BIT),
std::mt19937{std::random_device{}()});
BitSet bs(std::begin(init), std::end(init));
bs.resize(sz());
stats(bs.size());
std::string expected = boost::lexical_cast<std::string>(bs);
boost::reverse(expected);
BitSet revclone = bs;
reverse(revclone);
auto actual = boost::lexical_cast<std::string>(revclone);
if (expected != actual) {
std::cout << __PRETTY_FUNCTION__ << " '" << bs << "': \n"
<< " expected: " << expected << "\n"
<< " actual: " << actual << "\n";
return false;
}
}
return true;
}
int main() {
auto start = now();
ba::accumulator_set<double,
ba::stats<bat::mean, bat::variance, bat::min, bat::max>>
stats;
if (run_test<unsigned char>(10'000, stats))
std::cout << "Completed 10'000 tests with char blocks\n";
if (run_test<uint16_t>(10'000, stats))
std::cout << "Completed 10'000 tests with uint16_t blocks\n";
if (run_test<uint32_t>(100'000, stats))
std::cout << "Completed 100'000 tests with uint32_t blocks\n";
if (run_test<uintmax_t>(1'000'000, stats))
std::cout << "Completed 1'000'000 tests with uintmax_t blocks\n";
auto cost = ((now() - start)/1.us) / ba::count(stats);
std::cout
<< "Samples " << ba::count(stats)
<< " mean: " << ba::mean(stats)
<< " min: " << ba::min(stats)
<< " max: " << ba::max(stats)
<< " stddev: " << std::sqrt(ba::variance(stats))
<< "\n";
std::cout << "Average cost " << cost << "μs\n";
}
Prints, on my machine:
Completed 10'000 tests with char blocks
Completed 10'000 tests with uint16_t blocks
Completed 100'000 tests with uint32_t blocks
Completed 1'000'000 tests with uintmax_t blocks
Samples 1120000 mean: 90.3283 min: 0 max: 192 stddev: 55.9233
Average cost 3.69335μs
real 0m4,141s
user 0m4,061s
sys 0m0,003s
So, with a mean size of 90 bits, bitsets up to 192 bits can be reversed in less than than 4μs. Not bad.
Proper Micro Benchmarks
Using Nonius, we can get reliable data from predictable tests. For sizes 31, 32, 37 bits net timings are in the 10-30ns range.
Code used:
#define NONIUS_RUNNER
#include <nonius/nonius.h++>
#include <nonius/main.h++>
template <typename Block> void run_test(nonius::chronometer& cm, size_t target_size) {
using BitSet = boost::dynamic_bitset<Block>;
static const std::string data{
"0100110111010010010001100111010010010001011100100100111010100010011010"
"01100000011000010001110111"};
BitSet bs(data, 0, target_size);
assert(bs.size() == target_size);
cm.measure([&] { reverse(bs); });
}
NONIUS_BENCHMARK("Block=uchar, sz=32", [](nonius::chronometer cm) { run_test<uint8_t>(cm, 32); })
NONIUS_BENCHMARK("Block=uint16_t, sz=32", [](nonius::chronometer cm) { run_test<uint16_t>(cm, 32); })
NONIUS_BENCHMARK("Block=uint32_t, sz=32", [](nonius::chronometer cm) { run_test<uint32_t>(cm, 32); })
NONIUS_BENCHMARK("Block=uintmax_t, sz=32", [](nonius::chronometer cm) { run_test<uintmax_t>(cm, 32); })
NONIUS_BENCHMARK("Block=uchar, sz=31", [](nonius::chronometer cm) { run_test<uint8_t>(cm, 31); })
NONIUS_BENCHMARK("Block=uint16_t, sz=31", [](nonius::chronometer cm) { run_test<uint16_t>(cm, 31); })
NONIUS_BENCHMARK("Block=uint32_t, sz=31", [](nonius::chronometer cm) { run_test<uint32_t>(cm, 31); })
NONIUS_BENCHMARK("Block=uintmax_t, sz=31", [](nonius::chronometer cm) { run_test<uintmax_t>(cm, 31); })
NONIUS_BENCHMARK("Block=uchar, sz=37", [](nonius::chronometer cm) { run_test<uint8_t>(cm, 37); })
NONIUS_BENCHMARK("Block=uint16_t, sz=37", [](nonius::chronometer cm) { run_test<uint16_t>(cm, 37); })
NONIUS_BENCHMARK("Block=uint32_t, sz=37", [](nonius::chronometer cm) { run_test<uint32_t>(cm, 37); })
NONIUS_BENCHMARK("Block=uintmax_t, sz=37", [](nonius::chronometer cm) { run_test<uintmax_t>(cm, 37); })
Full interactive chart: http://stackoverflow-sehe.s3.amazonaws.com/974d10e8-74ae-4fcf-be03-6a0d0e01b5ad/stats.html
