1

I am wanting to store a sequence of bool values in azure blob storage along with a separate map file (a comma separated list of names in the same order as the bool values).

As efficiency and storage are important factors I want to store the bool values as a byte array.

I have gone about this by constructing a bitarray, i am then using the bitarray CopyTo to copy it into a byte array. I have tested locally by converting it back with new BitArray(mybytearray) and comparing the result and it matches fine.

My question is, is this code going to be reliable? or will it be environment/hardware specific? do big/little endians come into effect here? when i deploy this to an azure service running on windows will it operate the same? would running off a linux vm cause the endianess to create wrong outputs? (a separate azure service will actually be the one reading from the blob but it should be configured the same i.e windows/plan)

I am a bit confused, while my code seems to be working fine, the below SO post accepted answer has comments saying that the way bitarray writes to bytearray the order will be reversed, so you should specifically reverse the array before putting it into the bytearray. Convert from BitArray to Byte

This is a minimal code example of my logic (my case has hundreds of thousands of booleans):

     var boolarr = new bool[] { true, true, false, false, false };
            var bitarr = new BitArray(boolarr);
            var length = bitarr.Length / 8;
            if (bitarr.Length % 8 > 0)
                length += 1;
            var bytearray = new byte[length];
            bitarr.CopyTo(bytearray, 0);
            BitArray bits = new BitArray(bytearray);

//Below just checking the above is working properly
            for (var i = 0; i < bits.Length; i++)
            {
                if (i >= bitarr.Length)
                    break;//last byte may not be using all bits
                if (bits[i] != bitarr[i])
                {
                    throw new Exception();//This would be bad.
                }
            }

F Dev
  • 82
  • 8
  • 1
    In what world would your little bool array be using excessive resources? This looks like an unnecessary micro-optimization to solve a non-existent problem. – Nigel Feb 07 '22 at 23:15
  • So is the goal here just representing a `bool[]` as a `byte[]`, just to save a bit of storage (.NET `bool` is 32-bit, vs `byte` is 8 bits, that's the idea?) - where does the `BitArray` come in? Why not just go `bool[]` -> `byte[]`? – CoolBots Feb 07 '22 at 23:22
  • @CoolBots it looks like they are trying to use every single bit in each byte, effectively reducing the data size by a factor of 8. bools are 1 byte in C# – Nigel Feb 07 '22 at 23:23
  • @NigelBess then there must be a mistake in the OP's write-up - "As efficiency and storage are important factors I want to store the bool values as a **byte** array.". Also, you're right, `bool` is same size as `byte`... – CoolBots Feb 07 '22 at 23:25
  • @CoolBots. Nope, they are making use of all the bits in the bytes. Effectively they are replacing this: (0b01,0b01,0b00,0b00,0b00) with this: 0b11000 – Nigel Feb 07 '22 at 23:27
  • @NigelBess i need to store up to 800,000 boolean values, this is across thousands of files using the same map to read. The current method is saving them as CSV values into blob effectively requiring each value to take at minimum a char worth of space. – F Dev Feb 07 '22 at 23:31
  • to be more specific each value actually has 3 states it can be in so i am wanting to be representing them as pairs of bits for true,true, true,false, false,false instead of as a character each. – F Dev Feb 07 '22 at 23:38
  • 2
    Endianness only comes into play with multi-byte values. An array of bytes is stored in the same order on any platform, and the physical order of bits within a byte is opaque to high-level languages. – 500 - Internal Server Error Feb 07 '22 at 23:41
  • @FDev ok, but why do you need a `BitArray`? why not just shift the values into their desired positions directly into a `byte` within the resulting `byte[]`? Also, endianness is a non-issue if it's your own system that's reading and writing the values - you know how you wrote them, so read them back the same way. – CoolBots Feb 07 '22 at 23:42
  • @CoolBots the idea is in a byte I can fit in 8 bool values, my data needs 2 bits to be represented so I can fit in 4 of my values in a single byte. Not only is it less storage but it is less data moving around the different services in the system as the system is made up of multiple azure resources that are sending this information around. I am iterating through data to generate the bitarray, i could just as easily create a bool array, but my understanding is a bool array is actually going to be 1 byte per value. – F Dev Feb 07 '22 at 23:45
  • @CoolBots the logic seemed simpler making bit pairings and pushing those into bytes then creating bytes themselves as I would need to iterate through 4 values to create 1 byte and as I understand it a byte would be constructed from the integer representation of those unless i am missing an obvious and readable method of constructing those bytes. – F Dev Feb 07 '22 at 23:56

1 Answers1

0

I think the code will be hardware specific, but to be honest, I don't know the inner workings of BitArray and I can't tell you for sure. But I can give you some code that will work regardless of the hardware/environment:

private const int bitsPerByte = 8;
public static byte[] ToByteArray(bool[] bits)
{
    var bitCount = bits.Length;
    var byteCount = bitCount / bitsPerByte ;
    if (bitCount - byteCount*bitsPerByte  > 0) byteCount += 1;
    var bytes = new byte[byteCount];
    for (int i = 0; i < bitCount; i++)
    {
        SetBit(bytes, i, bits[i]);
    }

    return bytes;
}

public static bool[] ToBoolArray(byte[] bytes)
{
    var bitCount = bytes.Length * bitsPerByte;
    var bools = new bool[bitCount];
    for (int i = 0; i < bitCount; i++)
    {
        bools[i] = GetBit(bytes, i);
    }

    return bools;
}

This relies on the following helper code:

private static bool GetBit(byte[] bytes, int bitNumber)
{
    var byteNumber = bitNumber / bitsPerByte;
    var bit = bitNumber % bitsPerByte;
    return GetBit(bytes[byteNumber], bit);
}

private static bool GetBit(byte byteValue, int bitNumber) => (byteValue >> bitNumber) % 2 > 0;

private static void SetBit(byte[] bytes, int bitNumber, bool value)
{
    var byteNumber = bitNumber / bitsPerByte;
    var bit = bitNumber % bitsPerByte;
    SetBit(ref bytes[byteNumber], bit, value);
}

private static void SetBit(ref byte byteValue, int bitNumber, bool value)
{
    var operand = (byte)(1 << bitNumber);
    if (value)
    {
        byteValue |= operand;
    }
    else
    {
        operand = (byte)~operand;
        byteValue &= operand;
    }
}
Nigel
  • 2,961
  • 1
  • 14
  • 32