2

Am reading data out of a ByteBuffer but the values I need are stored as 10 bits. ie:

1100101110 = 814

I've tried using BitSet but this stores each byte in LSB first, which causes the final two bits to be incorrect the above example turns into:

1101001111 = 815

an example using a sequence of numbers:

                     8 bit    10 bit
01 02 03 04 05 06 07 08 | 09 10 | 11 12 13 14 15 16

would end up being:

                     8 bit    10 bit
08 07 06 05 04 03 02 01 | 16 15 | 14 13 12 11 10 09

So I can manage the first byte being in either order but the final two bits are taken from the wrong end of the next byte.

my current code is as follows:

// bb stands for the ByteBuffer in use.
//length stands for the number bytes covering the 10 bit numbers
BitSet data = getBitSet(bb, length);
int totalNumbers = data.length() / 10;
int[] numbers = new int[totalNumbers];
for (int i=0; i < totalNumbers; i++){
    int start = i*10;
    int end = (i+1)*10;
    BitSet bs = data.get(start, end);
    int tenBitNumber = 0;
    for (int j = bs.nextSetBit(0); j >= 0; j = bs.nextSetBit(j+1)) {
        double power = pow(2, 9-j);
        tenBitNumber += power;
    }
    numbers[i] = tenBitNumber;
}

a worked example in Big Endian format: sequence of bytes:

11001011|10110111|00101100|11000111

which transforms when using BitSet into:

11010011|11101101|00110100|11100011

What would be the best solution? I need to read multiple 10 bit length numbers from the ByteBuffer.

Tom
  • 343
  • 1
  • 13
  • 2
    The typical solution is buffer the bits in an `int`, extract slices of 10 from it and append a new byte whenever necessary. Not this weird bit-by-bit with floating point stuff, it's obfuscated and slow. – harold Oct 12 '16 at 10:51
  • Where does `10` in the first example come from? Is it the MSB of the next byte? If the answer is ?yes", what about the following 6 bits? – Sergey Kalinichenko Oct 12 '16 at 10:52
  • Let's say you have a sequence of bytes `aaaaaaaa bbbbbbbb cccccccc dddddddd eeeeeeee`. Do you need to re-partition it as `aaaaaaaabb bbbbbbcccc ccccdddddd ddeeeeeeee`? – Sergey Kalinichenko Oct 12 '16 at 10:55
  • @dasblinkenlight yes but I think harold is onto the right of what I want to do. Thanks for the help! :) – Tom Oct 12 '16 at 10:57
  • http://stackoverflow.com/questions/3842828/converting-little-endian-to-big-endian – TeWu Oct 12 '16 at 10:58
  • @Tom harold's talk about "extracting slices from `int`" and "appending a byte" to the output lead me to believe that he is describing the opposite process. – Sergey Kalinichenko Oct 12 '16 at 11:02
  • @dasblinkenlight oh well yeah, I don't have any control on how the 10 bit numbers have been stored as I'm working on extracting data from an archive which has compressed these numbers to save space. – Tom Oct 12 '16 at 11:06
  • Are ten-bit numbers stored the way I show in the example above, with least-significant two bits of the first ten-bit number "glued" to the next byte? Is there any inversion of bits going on? – Sergey Kalinichenko Oct 12 '16 at 11:09
  • @dasblinkenlight I meant reading a byte from the input and appending it to the buffer, that's what OP need right? But yea it works about the same way in reverse. – harold Oct 12 '16 at 11:10
  • Yes that's correct, the least-significant two bits are the first two bits of the next number. There's no inversion in the raw data, that only happens when I use `BitSet` which inverts them into little endian format.. – Tom Oct 12 '16 at 11:12
  • Please clarify the first sentence. Whilst MSB and LSB make sense in serial protocols where transmission order matters, within a single byte from a file that's relatively meaningless - the MSB is always the one with bit mask `0x80`. – Alnitak Oct 12 '16 at 11:16
  • Within a single byte yes but I have n bits overflowing into the next, it's those n extra bits that I need to extract. I'll attempt to clarify in the question. – Tom Oct 12 '16 at 11:20

1 Answers1

3

First, let's deal with a situation when five bytes (40 bits, or 4 ten-bit numbers) are available: split the input in chunks of five bytes. Each chunk will produce a group of four 10-bit numbers:

int[] convertFive(byte a, byte b, byte c, byte d, byte e) {
    int p = ((a & 0xff) << 2) | (b & 0xc0) >>> 6;
    int q = ((b & 0x3f) << 4) | (c & 0xf0) >>> 4;
    int r = ((c & 0x0f) << 6) | (d & 0xfc) >>> 2;
    int s = ((d & 0x03) << 8) | (e & 0xff) >>> 0;
    return new int [] { p, q, r, s }; 
}

Append these four ints to the output to produce the final result. You can modify the method to append output as you go, instead of creating four-element arrays all the time.

Deal with the remaining chunk of less than five bytes in a similar way: two bytes become one 10-bit number, three bytes become two numbers, and four bytes become three numbers. If the remainder is one-byte long, the input is invalid.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523