0

I have this very big vector<unsigned char> bitmap that I use as a bitmap. Most of time it has like millions or billions of entries. So every bit in a unsigned char stands for a free(0) or used(1) block in my program. My function to set those bits looks like this:

int setNextFreeIDX_In(std::vector<unsigned char> & bitmap){
    for(int i=0; i<bitmap.size(); i++){
        for(int bit=0; bit<8;  bit++){
            if(!(bitmap[i] & (1 << bit))){    //72.33% <- this is what the Xcode Profile says
                turnOn(bit, bitmap[i]);
                return (i*8) + bit;
            }
        }
    }
    return -1;
}

So after profiling my Code with Xcode instruments, it says that this function takes a lot of time. (about 25 minutes! when working with about 4.5 gb of blocks and every block is 4096 bytes)

do you see anything that could take that much time. Or maybe there is a better way to do this!? I already tried to do this with the iterator instead of for(int i.... but it still takes a lot of time. Here is my turnOn function:

void turnOn(int bitNumber, unsigned char & byte){        //75.00% <- Profiler says this    
    byte |= 0x01 << bitNumber; //turn on the N-th bit
}
  • One quick way to speed this up is to check if `bitmap[i]` is 255 before the `bit` loop. Although if 75% of your time is spent setting bits in `turnOn`, you're setting a lot of bits and there likely isn't a whole lot you can do to speed that up. Do you have optimizations turned on? – 1201ProgramAlarm Feb 13 '21 at 17:37
  • How important is it that it is the first bit not set in the array, or will any unset bit do? – Surt Feb 13 '21 at 17:48
  • @Surt it is very important that it is the first bit not set. Because its like an index in a file. If I would use any unset bit, I would have a lot of trash in my file and that would cause other problems. So there is no way around that unfortunately – Jack Chaker Feb 13 '21 at 17:58
  • @1201ProgramAlarm im trying -O3fast flag right now and checking if bitmap[I] == 255. It runs good. The 4,5 Gb file takes 8 minutes instead of 30. Thx a lot ! – Jack Chaker Feb 13 '21 at 18:02
  • Are any bit ever unset, at random positions? – Surt Feb 13 '21 at 18:42
  • @Surt in the beginning everything is 0 except the first one. then one after another is set to 1 while I add files into my system. if I remove a file from my system there could be unset bits anywhere in the bitmap. – Jack Chaker Feb 13 '21 at 22:02

2 Answers2

0

An O(N) solution is not really good, maybe we can let us inspire by van Emde Boas tree but a bit simpler and reduce that to O(sqrt(N))

if we take the SqrtN = sqrt of N (rounded up) and makes that many indexes of

struct index {
  int32_t seq; // index into the section 
  int32_t free;
};

Warning: totally untested concept code

std::vector<index> tree(SqrtN, { 0, SqrtN });
tree[SqrtN-1].free = N-SqrtN*(SqrtN-1); // remainder, hopefully correct
index root { 0, N };

int64_t setNextFreeIDX_In(std::vector<unsigned char> & bitmap){
  if (root.free == 0)
    throw "No free index";
  // find first potentially free
  auto found = std::find(std::advance(tree.begin(), root.seq), tree.end(), 
                         [](index& i){ return (i.free > 0);}); // O(SqrtN)
  if (found == tree.end())
    throw "Tree is corrupt";
  // find first free in sequence
  auto idx = std::distance(tree.begin(), found) * SqrtN;
  
  auto start = (idx+found->seq)/8;
  for(int i=start; i<start+SqrtN; i++){ O(SqrtN)
    if (bitmap[i] != 255) {
      for(int bit=0; bit<8;  bit++){
        if(!(bitmap[i] & (1 << bit))){
          turnOn(bit, bitmap[i]); 
          // update tree and root
          root.free--;
          if (--found->free == 0);
            root.seq++;
          found->seq = (i-idx)*8+bit;

          return (i*8) + bit;
        }
      }
    } 
  }
}

int SqrtUp(int64_t pos) {
  return std::ceil(std::sqrt(pos));
}

bool turnOff(int bitNumber, unsigned char & byte){
    bool res = byte & (0x01 << bitNumber);
    byte &= ~(0x01 << bitNumber); // turn off the N-th bit
    return res;
}

int deleteEntry(int64_t pos) {
  // update tree and root
  int seq = SqrtUp(pos);
  if (seq < root.seq)
    root.seq = seq;
  int remain = pos - seq * SqrtN;
  tree[seq].seq = remain;
  tree[seq].free++;
  auto res = turnOff(remain & 0x7, bitmap[pos/8));
  // optional check for res == true
  ASSERT(res); // for use in debugging
}

Further speed improvements should be possible by using 64-bit numbers instead of bytes and using either C++20 bit-header or some buildin bit manipulation, i64 has single instructions to do that in O(1).

Surt
  • 15,501
  • 3
  • 23
  • 39
0

So i simply solved this by adding a static var int lastSettedBlockIDX this helps me to jump right at the last setted bit. Chances are high that the next bit would be 0 so I don't have to loop through all idx´s anymore. Still there could be cases where iterating through the whole list would be necessary. For example after there are unsetted bits in the middle and setted bits in the end of the list. So I will take a closer look @Surt answer later if this Problem appears. Thx a lot.!