8

I'm thinking about writing some data into a bit stream using C. There are two ways come in mind. One is to concatenate variable bit-length symbols into a contiguous bit sequence, but in this way my decoder will probably have a hard time separating those symbols from this continuous bit stream. Another way is to distribute same amount of bits for which symbol and in that way the decoder can easily recover the original data, but there may be a waste of bits since the symbols have different values which in turn cause many bits in the bit stream being zero(this waste bits I guess).

Any hint what I should do?

I'm new to programming. Any help will be appreciated.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user1530192
  • 81
  • 2
  • 3
  • Here's my answer to similar question here: http://stackoverflow.com/questions/11253123/how-can-i-print-a-bit-instead-of-byte-in-a-file/11253310#11253310 – Viktor Latypov Jul 16 '12 at 22:31
  • The usual way is to pack the bits, but that requires logic to know the bit count on the other side. You might end up decoding bit by bit to know when you've reached the end of a symbol. – Mark Ransom Jul 16 '12 at 22:33
  • 1
    Your question is related to the field of coding. Huffman coding, as mentioned below, is one option. But there are others as Huffman coding isn't the only one (but it is certainly the most popular). See the book "Compression and Coding Algorithms" by Moffat and Turpin. Most compression books have something about coding; this book focusses on coding. In terms of "hard time separating", you need a code that is prefix-free -- no code is a prefix of any other. – Ray Aug 07 '12 at 03:42

1 Answers1

4

Sounds like your trying to do something similiar to a Huffman compression scheme? I would just go byte-by-byte (char)and keep track of the offset within the byte where I read off the last symbol.

Assuming none of your symbols would be bigger than char. It would look something like this:

struct bitstream {
   char *data;
   int data_size;           // size of 'data' array
   int last_bit_offset;     // last bit in the stream 

   int current_data_offset; // position in 'data', i.e. data[current_data_offset] is current reading/writing byte
   int current_bit_offset;  // which bit we are currently reading/writing
}

char decodeNextSymbol(bitstream *bs) {

}

int encodeNextSymbol(bitstream *bs, char symbol) {

}

The matching code for decodeNextSymbol and encodeNextSymbol would have to use the C bitwise operations ('&' (bitwise AND), and '|' (bitwise OR) for example. I would then come up with a list of all my symbols, starting with the shortest first, and do a while loop that matches the shortest symbol. For example, if one of your symbols is '101', then if the stream is '1011101', it would match the first '101' and would continue to match the rest of the stream '1101' You would also have to handle the case where your symbol values overflow from one byte to the next.