1

I created a data structure to compactly represent an array of small integers:

/*
 * Compactly represents an array of N unsigned integers, where each one only
 * requires B bits to store.
 */
template<uint32_t N, uint8_t B>
class __attribute__((packed)) small_int_array {
private:
  static const uint32_t items_per_page = 64 / B;
  static const uint32_t num_pages = (N + items_per_page - 1) / items_per_page;
  static const uint64_t mask_unit = (1UL << B) - 1;

  struct helper_t {
    uint32_t page;
    uint8_t offset;

    helper_t(uint32_t index) : page(index/items_per_page),
      offset(index%items_per_page) {}
  };

  uint64_t _pages[num_pages];

public:
  small_int_array() { memset(this, 0, sizeof(this)); }

  uint8_t get(uint32_t index) const {
    helper_t helper(index);
    uint8_t shift = B*helper.offset;
    return (_pages[helper.page] & (mask_unit << shift)) >> shift;
  }

  void set(uint32_t index, uint8_t value) {
    helper_t helper(index);
    uint8_t shift = B*helper.offset;
    _pages[helper.page] &= ~0UL - (mask_unit << shift);
    _pages[helper.page] |= ((uint64_t)value) << shift;
  }
};

I then implemented this special method:

  /*
   * Returns a uniformly random index such that get(index)==value.
   * Returns -1 if no such index exists.
   */
  int32_t get_random_index(uint8_t value) const {
    int32_t candidates[N];
    int size=0;
    uint32_t index = 0;
    for (int i=0; i<num_pages; ++i) {
      uint64_t page = _pages[i];
      for (int j=0; j<items_per_page; ++j) {
        candidates[size] = index++;
        if (index==N) break;
        bool match = (page & mask_unit) == value;
        size += match ? 1 : 0;
        page = page >> B;
      }
    }
    if (size==0) return -1;
    return candidates[rand() % size];
  }

This does what I want, but I'm wondering if there is a more efficient implementation using bit tricks in lieu of the second for-loop. I'm open to changing the bit representation of the object as long as its size doesn't increase.

I'm using N~=100 and B<=4.

dshin
  • 2,354
  • 19
  • 29
  • Aren't those two nested loops the equivalent of enumerating the function `get` for index 0..max ? If so, how about just one loop that does that, recording candidates when the get get result matches the passed value? – danh Jul 11 '16 at 03:48
  • @danh Yes, that is equivalent, and would probably make for clearer code at the expense of a few more arithmetic operations. – dshin Jul 11 '16 at 04:32
  • That get code looks pretty tight, especially if you inline it in the loop. You could inline the helper() code, too, and maybe find some additional savings. – danh Jul 11 '16 at 04:48

1 Answers1

0

I don't see a lot of options for avoiding the inner loop. This looks pretty much fundamental to me: you have to check each value in the "page" whether it matches the parameter value. Need a loop for that. Seems pretty fundamental to me.

The only way I see of avoiding an explicit loop is to write a hairy specialized function that, essentially, generates an explicit compile-time check for every value in the page. That's possible, given that you've already worked out items_per_page. Feed that into std::index_sequence to get all your indexes, and pass them to a variadic function that manually compares each item in the "page" with value.

Technically that will avoid an explicit inner loop, but I doubt that it'll make much of a difference.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • Thanks. Do you see any possible improvement if we fix N=64, B=1, and value=0? – dshin Jul 11 '16 at 02:46
  • If `N` is 64, and `B` is 1, then you can forget the whole thing, because what you have there is the hapless `std::vector`. If you still insist on rolling your own [this question will give you some ideas to try](http://stackoverflow.com/questions/53161/find-the-highest-order-bit-in-c). – Sam Varshavchik Jul 11 '16 at 02:49
  • Thanks. My actual usage involves different values of N and B, so I need my implementation. I was just wondering if there might be a better implementation of `get_random_index()` for a simple case. If we can't improve for even the simple case, then there's probably not much that can be done for the general case. – dshin Jul 11 '16 at 02:58