13

I am implementing the huffman algorithm in C. I have got the basic functionality down up to the point where the binary codewords are obtained. so for example, abcd will be 100011000 or something similar. now the question is how do you write this code in binary form in the compressed file. I mean if I write it normally each 1 and 0 will be one character so there is no compression.

I need to write those 1s and 0s in their bit form. is that possible in C. if so how?

A-Sharabiani
  • 17,750
  • 17
  • 113
  • 128
sfactor
  • 12,592
  • 32
  • 102
  • 152
  • well i was merely asking how to do in this situation.writing the codes simply as ascii does not serve the purpose. there must be some other way. – sfactor Dec 06 '09 at 20:39
  • you should generate an int instead of a char* in the encoding function, or else write a function that will convert the string into an int or long representing that sequence of bits. – Vinko Vrsalovic Dec 06 '09 at 20:40
  • 1
    @Vinko: converting to number and then storing is no good. He would have to be careful about sign and endianness, not to mention architecture (int can be different size on different arch). Unsigned char is safest choice. – Stan Dec 06 '09 at 20:46
  • @Stan: Correct. My idea was along the lines of what Nils wrote, even if the implementation details I suggested were not appropriate. This is, do not write the ASCII value of each 1 or 0. – Vinko Vrsalovic Dec 06 '09 at 20:57

1 Answers1

22

Collect bits until you have enough bits to fill a byte and then write it..

E.g. something like this:

int current_bit = 0;
unsigned char bit_buffer;

FILE *f;

void WriteBit (int bit)
{
  if (bit)
    bit_buffer |= (1<<current_bit);

  current_bit++;
  if (current_bit == 8)
  {
    fwrite (&bit_buffer, 1, 1, f);
    current_bit = 0;
    bit_buffer = 0;
  }
}

Once you're done writing your bits you have to flush the bit-buffer. To do so just write bits until current_bit equals to zero:

void Flush_Bits (void)
{
  while (current_bit) 
    WriteBit (0);
}
Nils Pipenbrinck
  • 83,631
  • 31
  • 151
  • 221
  • thanks for pointing this out...so how to end the file then?...i presume we need to do this ourselves in this case. – sfactor Dec 07 '09 at 17:49
  • just call Flush_Bits as defined above. – Nils Pipenbrinck Dec 07 '09 at 18:29
  • There is a problem in the code: You only shift, when a bit is set. Assume you want to output `10000000`. The 1 will never reach the most significant bit position, right? – Dr. Jan-Philip Gehrcke Jun 05 '11 at 14:30
  • 1
    Therefore, the first two code lines in `WriteBit()` must be `bit_buffer <<= 1; if (bit) bit_buffer |= 0x1;` – Dr. Jan-Philip Gehrcke Jun 05 '11 at 14:33
  • I have implemented Huffman in C++ (http://www.Planet-Source-Code.com/vb/scripts/ShowCode.asp?txtCodeId=9737&lngWId=3 in case you're interested). You should end the encoded file with trailing 0s (in case the number of encoded bits is not a multiple of 8). If you implement the Decoding phase correctly, then it will simply and transparently ignore the redundant 0s at the end, because the number of symbols left to decode will be 0 at that point. – barak manos Jan 02 '14 at 08:07