4

Objective

I'm trying to extract elements withing a container (e.g. vector, set, etc...) but instead of using the index, I am using the bit masking technique.

Scenario:

vector<string> alphabets {"a", "b", "c", "d", "e"};

Test Case:

  1. Input: 5 (eqivalent bitmask: 00101)

    Output: new vector {"c", "e"}

  2. Input 13 (bitmask: 01101)

    Output vector: {"b", "c", "e"}

Naive Working Solution:

vector<string*> extract(int mask){
    vector<string*> result;
    bitset<n> bits(mask);
    for (int j = 0; j < n; ++j) {
        if (bits[j]){
            result.push_back(&alphabets[j]);
        }
    }
}

Further Improvement

  • time complexity
  • space complexity
  • concept/idea??
  • API availability??

Example Use case.

Permutate all combination of a,b,c,d,e, whereby the a,b,c,d,e is wrapped on a container. (other approached has been mentioned in the Generating combinations in c++ question.)

#include <vector>
#include <iostream>
#include <algorithm>
#include <string>
#include <bitset>

using namespace std;


int main(){
    const int n = 5;

    vector<string> alphabets {"a", "b", "c", "d", "e"};

    for ( int i = 0; i < pow(2, n); ++i){
        vector<string*> result;
        bitset<n> bits(i);
        for (int j = 0; j < n; ++j) {
            if (bits[j]){
                result.push_back(&alphabets[j]);
            }
        }

        for (auto r: result){
            cout << *r;
        }
        cout << endl;
    }
    return 0;
}
Community
  • 1
  • 1
Yeo
  • 11,416
  • 6
  • 63
  • 90
  • what is the question? – 463035818_is_not_an_ai Aug 22 '16 at 15:22
  • I'm slightly concern with the performance in the first place, because I'm trying to solve an small problemset of NP complete problem which require me brute force all combinations. Thus, solving this question is only solving small part of my bigger problem. (i.e. Having a better solution of permutate all combinations) – Yeo Aug 22 '16 at 15:22
  • [Related](http://stackoverflow.com/q/4709160/335858). – Sergey Kalinichenko Aug 22 '16 at 15:26
  • 2
    You could use a library function to get the permutations, this is usually faster then your own implementation: std::next_permutation – Silu Aug 22 '16 at 15:32
  • I'm not sure you really need the variable `result` for your small intermediate problem. The variable `bits` of type `bitset` and `alphabet` have the informative content of `result`. For example, the union on `bitset` is far more efficient (it can use the unsigned bit or '|' operator) than a union of `vector`. – Franck Aug 22 '16 at 15:34
  • @Silu exactly that was what I am thinking earlier whether next permutation could be much faster if I could just bit masking the vector directly? However, i'm still unsure if vector does support bitmasking operation to extract the elements. – Yeo Aug 22 '16 at 15:34
  • @Franck could you elaborate more on that. I'm not sure if I understand you correctly, but I tried to use bitwise operator `|` on vector, however a vector is not capable of performing bitwise operator. – Yeo Aug 22 '16 at 15:40
  • @Yeo A naive implementation union on `vector` exploiting the fact that your pointers are sorted within `alphabet` could be: create two synchronized iterators on both vectors. The synchronization would occur on the (sorted) pointer. Then each time a pointer does not appear in the other vector, add it. The complexity is linear in the size of your vectors, but the execution time will be bigger that a naive operator | on bitsets. – Franck Aug 22 '16 at 15:55
  • It's a bit surprising you are using reverse order, i.e. 5 (`00101`) being `{"c", "e"}` instead of `{"a", "c"}`. Do you insist on the reverse order? What I mean, by normal order I would expect the `bit_mask_of_letter = (1 << index_of_letter);`, so `"a"` masked as `00001`, `"b"` masked as `00010`, ... – Ped7g Aug 22 '16 at 16:02
  • @Ped7g thanks for highlighting that, I don't insist on reverse order. By any mean to achieve a better solution is preferred. Because the ultimate goal for me is to permutate all combination, so the order doesn't really matter. – Yeo Aug 22 '16 at 16:06

2 Answers2

3

I think this is a reasonable starting point if you favour performance over readability.

attempt 1

Basically I am avoiding any memory allocations.

#include <string>
#include <vector>
#include <bitset>
#include <iostream>
#include <iterator>
#include <tuple>
#include <array>


template<class From, std::size_t N>
auto
select(From const& from, std::bitset<N> const& bits)
{
    std::array<const std::string*, N> result { nullptr };
    auto i = std::begin(result);
    std::size_t found;
    std::size_t count = found = bits.count();
    std::size_t index = 0;
    while (count)
    {
        if (bits.test(index)) {
            *i++ = &from[index];
            --count;
        }
        ++index;
    }
    return std::make_tuple(found, result);
}

int main()
{

    std::vector<std::string> alphabet = { "a", "b", "c", "d", "e", "f", "g", "h" };

    for (unsigned x = 0 ; x < 256 ; ++x)
    {
        auto info = select(alphabet, std::bitset<8>(x));
        auto ptrs = std::get<1>(info).data();
        auto size = std::get<0>(info);
        while(size--)
        {
            std::cout << *(*ptrs++) << ", ";
        }

        std::cout << '\n';
    }
}

attempt 2 - runtime lookup is in the order of nanoseconds...

Here I am pre-computing all possible alphabets at compile time.

The run-time is of course, blindingly fast. However, an alphabet of more than say, 14 characters could take a while to compile...

update: warning! when I set the alphabet size to 16 clang to consumed 32GB of memory, stopped every other application on the desktop and required a reboot of my macbook before I could do anything else. You have been warned.

#include <string>
#include <vector>
#include <bitset>
#include <iostream>
#include <iterator>
#include <tuple>
#include <array>


template<class From, std::size_t N>
auto
select(From const& from, std::bitset<N> const& bits)
{
    std::array<const std::string*, N> result { nullptr };
    auto i = std::begin(result);
    std::size_t found;
    std::size_t count = found = bits.count();
    std::size_t index = 0;
    while (count)
    {
        if (bits.test(index)) {
            *i++ = &from[index];
            --count;
        }
        ++index;
    }
    return std::make_tuple(found, result);
}

template<std::size_t Limit>
struct alphabet
{
    constexpr alphabet(std::size_t mask)
    : size(0)
    , data { }
    {
        for (std::size_t i = 0 ; i < Limit ; ++i)
        {
            if (mask & (1 << i))
                data[size++] = char('a' + i);
        }
    }

    std::size_t size;
    char data[Limit];

    friend decltype(auto) operator<<(std::ostream& os, alphabet const& a)
    {
        auto sep = "";
        for (std::size_t i = 0 ; i < a.size; ++i)
        {
            std::cout << sep << a.data[i];
            sep = ", ";
        }
        return os;
    }
};

template<std::size_t Limit>
constexpr alphabet<Limit> make_iteration(std::size_t mask)
{
    alphabet<Limit> result { mask };
    return result;
}

template<std::size_t Limit, std::size_t...Is>
constexpr auto make_iterations(std::index_sequence<Is...>)
{
    constexpr auto result_space_size = sizeof...(Is);
    std::array<alphabet<Limit>, result_space_size> result
    {
        make_iteration<Limit>(Is)...
    };
    return result;
}

template<std::size_t Limit>
constexpr auto make_iterations()
{
    return make_iterations<Limit>(std::make_index_sequence<std::size_t(1 << Limit) - 1>());
}

int main()
{
    static constexpr auto alphabets = make_iterations<8>();
    for(const auto& alphabet : alphabets)
    {
        std::cout << alphabet << std::endl;
    }
}

attempt 3

Using a very rudimentary fixed capacity container of pointers to matching elements. I've added constexpr. This won't improve the majority of cases but could certainly improve statically allocated selections.

#include <vector>
#include <bitset>
#include <iostream>
#include <iterator>
#include <tuple>
#include <array>

namespace quick_and_dirty {

template<class T, std::size_t Capacity>
struct fixed_capacity_vector
{
    using value_type = T;

    constexpr fixed_capacity_vector()
    : store_()
    , size_(0)
    {}

    constexpr auto push_back(value_type v)
    {
        store_[size_] = std::move(v);
        ++ size_;
    }

    constexpr auto begin() const { return store_.begin(); }
    constexpr auto end() const { return begin() + size_; }

private:
    std::array<T, Capacity> store_;
    std::size_t size_;
};

} // namespace quick_and_dirty

template<class From, std::size_t N>
constexpr
auto
select(From const& from, std::bitset<N> const& bits)
{
    using value_type = typename From::value_type;
    using ptr_type = std::add_pointer_t<std::add_const_t<value_type>>;

    auto result = quick_and_dirty::fixed_capacity_vector<ptr_type, N>();

    std::size_t count = bits.count();

    for (std::size_t index = 0 ; count ; ++index)
    {
        if (bits.test(index)) {
            result.push_back(&from[index]);
            --count;
        }
    }
    return result;
}

int main()
{

    std::vector<std::string> alphabet = { "a", "b", "c", "d", "e", "f", "g", "h" };

    for (unsigned x = 0 ; x < 256 ; ++x)
    {
        for(auto p : select(alphabet, std::bitset<8>(x)))
        {
            std::cout << (*p) << ", ";
        }

        std::cout << '\n';
    }
}
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
1

Well, then your example can be lot more shorter (shorter to a point, where it probably doesn't demonstrate what you really need) (edit: I planned to output the chars directly to cout, not using result vector at all, then I forgot about it, when I was rewriting the code ... so it's still similar to yours):

#include <vector>
#include <iostream>
#include <string>

int main() {
    const std::vector<std::string> alphabets {"a", "b", "c", "d", "e"};
    const unsigned N = alphabets.size();
    const unsigned FULL_N_MASK = 1 << N;

    for (unsigned mask = 0; mask < FULL_N_MASK; ++mask) {

        std::vector<const std::string*> result;
        unsigned index = 0, test_mask = 1;
        while (index < N) {
            if (mask & test_mask) result.push_back(&alphabets[index]);
            ++index, test_mask <<= 1;
        }

        for (auto r : result) std::cout << *r;
        std::cout << std::endl;
    }
    return 0;
}

Which make me a bit wonder, why do you need to extract that result vector from the mask, maybe you can work with the mask itself only, and fetch the particular string inside inner loop (like I do when I'm building the result).

The major change in my code is to omit bitset<n> bits(i); initialization, as you already have the bits natively in i, easily accessible with C language bitwise operators (<< >> & ^ | ~), there's no need to convert them into the same thing with bitset again.

bitset usage would made sense, if the n would be too large to fit the mask into some regular type (like uint32_t). Even then for reasonably small fixed n I would probably use few 64/128/256b unsigned integers (what is available on your target platform).


About speed: You can't beat ++mask of course. std::next_permutation will be slower than single native machine code instruction, even if it is implemented in the same way for size under 32/64.

But the question is, if you can build your algorithm around that bit mask, to use that advantage effectively.

Many chess engines use bit-mask encoding of various chessboard values, to check easily whether some field is occupied, or do some preliminary turn availability checks in one step, like getting all pawns which can take opponent figure in next turn: attacking_pawns = (board_opponent & (my_pawns<<9) & valid_left_forward_pawn_take_mask) | (board_opponent & (my_pawns<<7) & valid_right_forward_pawn_take_mask); // for the side going "up" on board - 5 native CPU bitwise operators, and if result is 0, you know your pawns can't attack a single thing. Or you have bitmask of opponent pieces which can be taken. Compare that with naive looping trough all your pieces, and checking the [+-1, +1] fields for opponent pieces and adding those to some dynamic allocated memory like vector.

Or whether there exists algorithm capable to prune the combinations tree with early exit, completely skipping some substantial sub-set of permutations, that will be very likely faster than full ++mask scan.

Ped7g
  • 16,236
  • 3
  • 26
  • 63
  • 1
    Ok, so finally understood your technique, our algo is still similar, just that you perform the bitmask directly. and the test_mask is a replacement for my inner for loop. But yep, i agree this is better. In term of the time complexity, our algorithm still perform the same time complexity. But yours with a faster by constant factor. – Yeo Aug 22 '16 at 17:21
  • @Yeo: yes, I can't improve algo, which is not known. If you want full permutations, this is quite optimal (the problematic bit is the result in vector, not calculation of permutations). This may be even faster than having all permutations pre-computed ahead and stored in memory, as the bit mask and alphabets is using only very limited piece of memory, not trashing cache too much. But over 90% of CPU time will be spent in your real algorithm which is doing something with those permutations. So you are sort of prematurely optimizing stuff and maybe in wrong place. – Ped7g Aug 22 '16 at 17:43