I know this problem is likely steeped in XY problems, but I found it a nice challenge none-the-less, so I thought about implementing it with the Boost Interval Container library.
The beauty of the library is that you can forget about a lot of details, while you don't sacrifice a lot of performance.
I've taken the liberty to generalize it, so that it is capable of moving multiple blocks of dashes (uniform randomly selected) simultaneously.
The solution runs Live On Coliru and generates 1,000,000 random transpositions of the given sample with randomly varied numbers of moved blocks (1..3) in about 2673 ms (1156 ms on my machine):
Generator gen(test_case);
std::string result;
std::map<std::string, size_t> histo;
for(int i = 0; i < 1000000; ++i) {
auto const mobility = gen.gen_relocations(1 + gen.random(3)); // move n blocks of dashes
result.clear();
gen.apply_relocations(mobility, std::back_inserter(result));
histo[result]++;
}
Note: the benchmark times include the time taken to build the histogram of unique results generated
Let's do a code walk-through here to explain things:
I tried to use "readable" types:
namespace icl = boost::icl;
using Position = size_t;
using Map = icl::interval_set<Position>;
using Region = Map::value_type;
E.g. the function that builds the Map
of dashes is simply:
template <typename It> Map region_map(It f, It l) {
Map dashes;
for (Position pos = 0; f!=l; ++f, ++pos)
if ('-' == *f)
dashes += pos;
return dashes;
}
Note how I didn't particularly optimize this. I let the interval_set combine adjacent dashes. We might use hinted insertion, or a parser that add consecutive dashes as a block. I opted for KISS here.
Later on, we generate relocations, which map a Region
onto a non-moving Position
in the rest of the text.
using Relocs = boost::container::flat_multimap<Position, Region>;
By using the flat multimap, the caller has the entries already sorted by ascending insertion point. Because we use a reserve()
-ed up-front flat multimap, we avoid the overhead of a node based implementation of map here.
We start by picking the dash-blocks to be moved:
Map pick_dashes(int n) {
Map map;
if (!_dashes.empty())
for (int i = 0; i < n; ++i)
map += *std::next(_dashes.begin(), _select(_rng));
return map;
}
The random distribution have been dimensioned at construction, e.g.:
_dashes(region_map(_input.begin(), _input.end())),
_rng(std::random_device {}()),
_select (0, _dashes.iterative_size() - 1),
_randpos(0, _input.size() - 1),
Next, we assign insertion-positions to each. The trick is to assign positions inside non-mobile (inert) regions of the source.
- this includes other blocks of dashes that stay in their place
there is the degenerate case where everything is a block of dashes, we detected this in the constructor:
_is_degenerate(cardinality(_dashes) == _input.size())
So the code reads as follows:
Relocs gen_relocations(int n=1) {
Map const moving = pick_dashes(n);
Relocs relocs;
relocs.reserve(moving.iterative_size());
if (_is_degenerate)
{
// degenerate case (everything is a dash); no non-moving positions
// exist, just pick 0
for(auto& region : moving)
relocs.emplace(0, region);
} else {
auto inertia = [&] {
Position inert_point;
while (contains(moving, inert_point = _randpos(_rng)))
; // discard insertion points that are moving
return inert_point;
};
for(auto& region : moving)
relocs.emplace(inertia(), region);
}
return relocs;
}
Now all we need to do is apply the relocations.
The generic implementation of this is pretty straightforward. Again, it's not particularly optimized in order to keep it simple (KISS):
template <typename F>
void do_apply_relocations(Relocs const& mobility, F const& apply) const {
icl::split_interval_set<Position> steps {{0, _input.size()}};
for (auto& m : mobility) {
steps += m.first; // force a split on insertion point
steps -= m.second; // remove the source of the move
//std::cout << m.second << " moving to #" << m.first << ": " << steps << "\n";
}
auto next_mover = mobility.begin();
for(auto& step : steps) {
while (next_mover!=mobility.end() && contains(step, next_mover->first))
apply((next_mover++)->second, true);
apply(step, false);
}
}
Note The trick here is that we "abuse" the split_interval_set
combining strategy to break the processing into sub-ranges that "stop" at the randomly generated insertion points: these artificial regions are the "steps" in our generation loop.
The apply
function there is what we implement to get what we want. In our case we want a string like A--TG-DFGR(----)--
so we write an overload that appends to a container (e.g. std::string
) using any output iterator:
template <typename Out>
Out apply_relocations(Relocs const& mobility, Out out) const {
if (_is_degenerate)
return std::copy(_input.begin(), _input.end(), out);
auto to_iter = [this](Position pos) { return _input.begin() + pos; };
do_apply_relocations(mobility, [&](Region const& r, bool relocated) {
if (relocated) *out++ = '(';
out = std::copy(
to_iter(first(r)),
to_iter(last(r)+1),
out
);
if (relocated) *out++ = ')';
});
return out;
}
Note The "complicated" part here are mapping the Position
to input iterators (to_iter
) and the code to optionally add ()
around moved blocks.
And with that, we have seen all the code.
Full Listing
#include <boost/container/flat_map.hpp>
#include <boost/icl/interval_set.hpp>
#include <boost/icl/split_interval_set.hpp>
#include <boost/icl/separate_interval_set.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/range/algorithm.hpp>
#include <iomanip>
#include <iostream>
#include <random>
#include <map>
#include <chrono>
namespace icl = boost::icl;
using Position = size_t;
using Map = icl::interval_set<Position>;
using Region = Map::value_type;
using Relocs = boost::container::flat_multimap<Position, Region>;
struct Generator {
Generator(std::string const& input)
: _input(input),
_dashes(region_map(_input.begin(), _input.end())),
_rng(std::random_device {}()),
_select (0, _dashes.iterative_size() - 1),
_randpos(0, _input.size() - 1),
_is_degenerate(cardinality(_dashes) == _input.size())
{
}
unsigned random(unsigned below) {
return _rng() % below; // q&d, only here to make the tests deterministic for a fixed seed
}
Map full() const {
return Map { { 0, _input.size() } };
}
Relocs gen_relocations(int n=1) {
Map const moving = pick_dashes(n);
Relocs relocs;
relocs.reserve(moving.iterative_size());
if (_is_degenerate)
{
// degenerate case (everything is a dash); no non-moving positions
// exist, just pick 0
for(auto& region : moving)
relocs.emplace(0, region);
} else {
auto inertia = [&] {
Position inert_point;
while (contains(moving, inert_point = _randpos(_rng)))
; // discard insertion points that are moving
return inert_point;
};
for(auto& region : moving)
relocs.emplace(inertia(), region);
}
return relocs;
}
template <typename Out>
Out apply_relocations(Relocs const& mobility, Out out) const {
if (_is_degenerate)
return std::copy(_input.begin(), _input.end(), out);
auto to_iter = [this](Position pos) { return _input.begin() + pos; };
do_apply_relocations(mobility, [&](Region const& r, bool relocated) {
if (relocated) *out++ = '(';
out = std::copy(
to_iter(first(r)),
to_iter(last(r)+1),
out
);
if (relocated) *out++ = ')';
});
return out;
}
template <typename F>
void do_apply_relocations(Relocs const& mobility, F const& apply) const {
icl::split_interval_set<Position> steps {{0, _input.size()}};
for (auto& m : mobility) {
steps += m.first; // force a split on insertion point
steps -= m.second; // remove the source of the move
//std::cout << m.second << " moving to #" << m.first << ": " << steps << "\n";
}
auto next_mover = mobility.begin();
for(auto& step : steps) {
while (next_mover!=mobility.end() && contains(step, next_mover->first))
apply((next_mover++)->second, true);
apply(step, false);
}
}
private:
std::string _input;
Map _dashes;
std::mt19937 _rng;
std::uniform_int_distribution<Position> _select;
std::uniform_int_distribution<Position> _randpos;
bool _is_degenerate;
Map pick_dashes(int n) {
Map map;
if (!_dashes.empty())
for (int i = 0; i < n; ++i)
map += *std::next(_dashes.begin(), _select(_rng));
return map;
}
template <typename It> Map region_map(It f, It l) {
Map dashes;
for (Position pos = 0; f!=l; ++f, ++pos)
if ('-' == *f)
dashes += pos;
return dashes;
}
};
int main() {
for (std::string test_case : {
"----",
"A--TG-DF----GR--",
"",
"ATGDFGR",
})
{
auto start = std::chrono::high_resolution_clock::now();
Generator gen(test_case);
std::string result;
std::map<std::string, size_t> histo;
for(int i = 0; i < 1000000; ++i) {
auto const mobility = gen.gen_relocations(1 + gen.random(3)); // move n blocks of dashes
result.clear();
gen.apply_relocations(mobility, std::back_inserter(result));
histo[result]++;
}
std::cout << histo.size() << " unique results for '" << test_case << "'"
<< " in " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now()-start).count() << "ms\n";
std::multimap<size_t, std::string, std::greater<size_t> > ranked;
for (auto& entry : histo)
ranked.emplace(entry.second, entry.first);
int topN = 10;
for (auto& rank : ranked)
{
std::cout << std::setw(8) << std::right << rank.first << ": " << rank.second << "\n";
if (0 == --topN)
break;
}
}
}
Prints e.g.
1 unique results for '----' in 186ms
1000000: ----
3430 unique results for 'A--TG-DF----GR--' in 1156ms
9251: A(----)--TG-DFGR--
9226: (----)A--TG-DFGR--
9191: A--T(----)G-DFGR--
9150: A--TG-DFGR-(----)-
9132: A--(----)TG-DFGR--
9128: A--TG(----)-DFGR--
9109: A--TG-D(----)FGR--
9098: A--TG-DFG(----)R--
9079: A--TG-DFGR(----)--
9047: A-(----)-TG-DFGR--
1 unique results for '' in 25ms
1000000:
1 unique results for 'ATGDFGR' in 77ms
1000000: ATGDFGR