0

I am working on Huffman Compression algorithm. I have the code for each character. For example f=1100
d=111
e=1101
b=101
c=100
a=0

Now in order to achieve compression I need to write the codes as bits to a binary file. I am right now able to wrote them as bytes, which is doing nothing but increasing the size of compressed file. How do I write the codes as bits to binary file in Java?

Maverick
  • 91
  • 3
  • 10

1 Answers1

1

Well if you have the text "fdebcafdbca" you would need to write that as the bits:

110011111011011000110011111011011000

Separated and padded:

11001111 10110110 00110011 11101101 10000000 //4 bits of padding here

In hexadecimal:

CF B6 33 ED 80

So you'd write the byte array of 0xCF 0xB6 0x33 0xED 0x80 to a file. That's 5 bytes = 40 bits, 4 wasted bits. The text originally takes 12 bytes, so not much saving as you need to store the tree as well. You cannot avoid using padding if they don't align to a byte boundary.

Although not recommended at all, if you have a string then you could do this:

public class BitWriter {

    private byte nthBit = 0;
    private int index = 0;
    private byte[] data;

    public BitWriter( int nBits ) {
        this.data = new byte[(int)Math.ceil(nBits / 8.0)];
    }

    public void writeBit(boolean bit) {
        if( nthBit >= 8) {
            nthBit = 0;

            index++;
            if( index >= data.length) {
                throw new IndexOutOfBoundsException();
            }
        }
        byte b = data[index];

        int mask = (1 << (7 - nthBit));

        if( bit ) {
            b = (byte)(b | mask);
        }
        data[index] = b;
        nthBit++;
    }

    public byte[] toArray() {
        byte[] ret = new byte[data.length];
        System.arraycopy(data, 0, ret, 0, data.length);
        return ret;
    }

    public static void main( String... args ) {
        BitWriter bw = new BitWriter(6);
        String strbits = "101010";
        for( int i = 0; i < strbits.length(); i++) {
            bw.writeBit( strbits.charAt(i) == '1');
        }

        byte[] b = bw.toArray();
        for( byte a : b ) {
            System.out.format("%02X", a);
                 //A8 == 10101000

        }
    }

}
Esailija
  • 138,174
  • 23
  • 272
  • 326
  • I guess what I am trying to do is say if I have a string 101010 , I would like to consider each character as a bit and add it to a byte array till the byte array is full. – Maverick Mar 22 '13 at 16:57
  • @Maverick Well that's extremely wasteful, every char takes 16 bits to represent 1 bit. You never need any string representation of bits like that. – Esailija Mar 22 '13 at 17:01
  • @Maverick I have some code in my answer, is that what you mean – Esailija Mar 22 '13 at 17:24
  • Yes this is somewhat I am looking for except the padding that gets added. I do not want it. And I also wanted the byte array to be the same as input string, not in other format.Thanks – Maverick Mar 22 '13 at 17:50
  • @Maverick Sigh. If that was possible I would have done just that... – Esailija Mar 22 '13 at 17:52