1

I need to store fairly large size (1M x 1M) square matrices on a limited memory device. The matrices consist of elements only "1" and "0".

I have read that bool could save 1/0 ( or true/false) but that is also 8 bits storage.So I could have stored 8 times larger size if I could store the values in a single bit.

Instead of going with multidimensional storage I chose to store matrices in single dimension matrix and accesing elements via matrix[row*N + column] where size = N * N

I have a representation below to explain stuff better

+ = 1
o = 0
  0 1 2 3 4 5
0 + o + o o o
1 o + + + o +
2 o o + o + o
3 + o o o o o
4 o o + + o 1

is converted to std::vector<unsigned char> matrix = {1,0,1,0,0,0,...}

I chose the unsigned char to store 1 & 0s and have 8 bits of minimum size in c++ I have the following code as a starting point:


 int each_row;
 int row_val;
  for (int row = 0; row < 8; row++) {
    for (int col = 0; col < 8; col++) {
      
      unsigned char val = matrix[row * N + col];
      each_row = each_row + to_string(val);
      
    }

    row_val = std::stoi(each_row, nullptr, 2); // to int

so my question is how can I store values in each row/col in a single bit of a char to compress data ?

Now each value of val = 0b00000001 or 0b00000000 I want to make use of all bits like the first row and second row combined to be stored as(due to size lower than 8) 0b10100001

WinMa50
  • 13
  • 3
  • 4
    This actually is what [std::vector](https://en.cppreference.com/w/cpp/container/vector_bool) does. – Nathan Pierson Aug 03 '21 at 18:40
  • 1
    ...but consider its downsides. most importantly, not necessarily contiguous and `std::vector::reference` (eg returned by `operator[]`) is actually not `bool&`, though that just follows from the requirement of not storing `bool`s – 463035818_is_not_an_ai Aug 03 '21 at 18:44
  • 3
    Its downsides are overrated. Any decent implementation it'll be contiguous (and probably doesn't matter if its not). `operator[]` not returning a `bool&` is not an issue almost all of the time, and can be worked around if it ever is. If you need the packing its probably the right one to use. – Mike Vine Aug 03 '21 at 18:45
  • 1
    @MikeVine I didnt want to scare OP away from it, just save them from surprises that they might encounter at some point. Funnily I didn't find a good duplciate, but many Q&As about not using it – 463035818_is_not_an_ai Aug 03 '21 at 18:47
  • 2
    1M x 1M bits on a device with limited memory doesn't sound good. I get that to 125 GB. Perhaps a sparse matrix of some kind would be an option. How much memory is "limited memory"? – Ted Lyngmo Aug 03 '21 at 18:47
  • 1
    Limited memory these days could mean anything. Ted and I are likely used to measurements in K when we hear limited, but these days embedded devices with gigabytes of RAM aren't that uncommon. That said I'd still expect many to balk at or start thrashing if you took 1/8 GB. – user4581301 Aug 03 '21 at 18:52
  • Does it have to be _super fast™_? You could keep the actual matrix data in a file instead. – Ted Lyngmo Aug 03 '21 at 18:55
  • Following Ted Lyngmo comment, can you define "limited memory"? How large will your matrix be? How many ones? Can you run length encode it? – Costantino Grana Aug 03 '21 at 21:33

1 Answers1

1

This is exactly what std::vector<bool> is for.

#include <vector>
#include <string>


int main()
{
  std::string each_row;
  int row_val;
  int N = 8;
  std::vector<bool> matrix(N*N);
  for (int row = 0; row < N; row++) {
    for (int col = 0; col < N; col++) {
      
      unsigned char val = matrix[row * N + col];
      each_row = each_row + std::to_string(val);
      
    }
    row_val = std::stoi(each_row, nullptr, 2); // to int
  }
}

From cppreference:

std::vector<bool> is a possibly space-efficient specialization of std::vector for the type bool.

The manner in which std::vector<bool> is made space efficient (as well as whether it is optimized at all) is implementation defined. One potential optimization involves coalescing vector elements such that each element occupies a single bit instead of sizeof(bool) bytes.

std::vector<bool> behaves similarly to std::vector, but in order to be space efficient, it:

  • Does not necessarily store its elements as a contiguous array.
  • Exposes class std::vector<bool>::reference as a method of accessing individual bits. In particular, objects of this class are returned by operator[] by value.
  • Does not use std::allocator_traits::construct to construct bit values.
  • Does not guarantee that different elements in the same container can be modified concurrently by different threads.

Whether a specialization for std::vector<bool> was a good idea is a much discussed issue. See eg Why isn't vector<bool> a STL container?. Though, the differences mainly show up in generic code, where sometimes one has to add special cases, because std::vector<bool> is not like other std::vectors in its details.

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • *"... possibly ..."* - If I'm reading it correctly, no optimization is guaranteed by the standard. – Evg Aug 03 '21 at 19:14
  • 1
    @Evg thats how I read it as well. To my understanding the spec is quite liberal such that it does allow the optimization, but not require it. Now I wonder if there is a portable way to figure out if `std::vector` actually does use 1 bit per element – 463035818_is_not_an_ai Aug 03 '21 at 19:16
  • 1
    @Evg actually the standard is less liberal than cppreference suggests: "Unless described below, all operations have the same requirements and semantics as the primary vector template, except that operations dealing with the bool value type map to bit values in the container storage and allocator_­traits​::​construct is not used to construct these values." see [here](https://eel.is/c++draft/vector.bool#2). I am not fluent in standardese but after reading this I doubt that any implementation does not store bits rather than bools – 463035818_is_not_an_ai Aug 03 '21 at 19:23
  • Thanks for finding it. I would say this wording is somewhat vague and doesn't define such mapping as mapping between `bool` and exactly one bit. – Evg Aug 03 '21 at 19:48