3

I have a massive bitset representing all the bits of a 74MB file. I am using a compression algorithm to create a compressed string representation of this bitset. I need to then store that string into another dynamic bitset so that it can be decompressed later. My issue is that no matter how I try to fill the bitset from a string, it always fills in reverse order.

For simplicity's sake, let's say my compressed string is

1110001101010111011101

Here is my first attempt at filling my output dynamic bitset:

string compressed = 1110001101010111011101;
output = boost::dynamic_bitset<unsigned char> (compressed);

When I do this, my bitset turns into the reverse of the string:

   1011101110101011000111

So then I tried this:

output = boost::dynamic_bitset<unsigned char> (compressed.begin(), compressed.end());

And I get the exact same output. Next I tried using reverse iterators, and I have no idea how this is possible but it fills the bitset the exact same way:

output = boost::dynamic_bitset<unsigned char> (compressed.rbegin(), compressed.rend());

The only way I can get my bitset filled in the proper order is to do this:

for(uint i = 0; i < compressed.size(); i++)
{
    if(compressed[i] == '0') 
       output.push_back(false);
    else output.push_back(true);
}

This fills my output bitset in proper order, however it is significantly slower than using the other method (30 seconds slower with the string I am using). I can also use std::reverse to reverse the string in place then fill the bitset, but this takes a LOT of extra time. Is there any way to efficiently populate a dynamic bitset with values from a string in normal ordering? I understand why it's filled in reverse, but I am not using my bitset to represent an integer, I am using it to store data from a file so I need it to be in order. However it doesn't make sense why using reverse iterators would produce the same output.

EDIT I've screenshotted my output and the relevant portion of my code. The compressed output shows the first 6000 characters of the compressed version of my bitset, stored as a string. This string itself has no problems. Underlined in red is the line I am using to store this string in a bitset boost::dynamic_bitset output. Then, I print the first 6000 characters of the output bitset, and they are completely different. I should note that the "output" bitset is being passed into this function as a reference parameter, but is initially empty. Output

Connor S
  • 353
  • 2
  • 12
  • @eukaryota the complete example I am using is huge. I have 5 files each 400 lines long, using strings with size in the hundreds of millions. I think the problem is my misunderstanding of how dynamic_bitset works, not some other part of the code. I am using g++ with the latest version of boost (installed yesterday). – Connor S Dec 02 '18 at 21:40
  • @ConnorS *I think the problem is my misunderstanding of how dynamic_bitset works, not some other part of the code.* -- So create a small `main` program that uses `dynamic_bitset` in the way you're trying to use it, and then work from there to fix any issues. Once the small program works, then and only then you add it to the larger program. -- *the complete example I am using is huge* -- No programmer should be initially testing a certain component by adding it to their larger codebase, not knowing how to use that component. – PaulMcKenzie Dec 02 '18 at 21:55
  • @eukaryota thank you for this. Your example seems to show that in small test cases, the bitset is populated perfectly. I've edited the post with a screenshot of the results I am getting – Connor S Dec 02 '18 at 22:04
  • @eukaryota https://wandbox.org/permlink/NAojBaZzVGEoWf8a I've made this to try to test huge strings, and this was the max I could get before getting a "file size limit" from too much output. It seems to still work, which is strange because this is exactly how I am implementing it in my code – Connor S Dec 02 '18 at 22:24
  • @eukaryota Well this is interesting. I probably should have done this much earlier, but replacing my huge string with a simple "0000000011111111" and using the output = dynamic_bitset(compressed) method fills the bitset backwards – Connor S Dec 02 '18 at 22:35
  • I also notices that printing the bitset with the << operator will print it in the desired order. I might just iterate through it backwards instead of trying to fill it properly – Connor S Dec 02 '18 at 22:38

1 Answers1

1

Per documentation the std::string constructor of boost::dynamic_bitset will assign the last character of the input to the least-significant bit, which will be the bit with index 0.

Reading it back in a loop with growing index will give the original string in reverse order.

The constructors from iterators will do something completely different. They interpret each character as an integer (the character code) and will save a binary representation for each integer to the bitset.

Considering, that e.g. the stream output operator for dynamic_bitset will print the bitset starting from the highest-significant bit, I think there is no problem storing it this way. Just make sure to consider it if you use loops. Such loops should probably be avoided though, as individual-bit access will be slower than working on whole storage blocks simultaneously. Using a native block size instead of unsigned char would probably be advisable for the same reason.

If you really need to store in the other order reverse your string first:

std::reverse(compressed.begin(), compressed.end());