9

I'm just beginning to learn about file compression and I've run into a bit of a roadblock. I have an application that will encode a string such as "program" as a compressed binary representation "010100111111011000"(note this is still stored as a String).

Encoding
g       111
r       10
a       110
p       010
o       011
m       00

Now I need to write this to the file system using a FileOutputStream, the problem I'm having is, how can I convert the string "010100111111011000" to a byte[]/bytes to be written to the file system with FileOutputStream?

I've never worked with bits/bytes before so I'm kind of at a dead end here.

Kevin
  • 53,822
  • 15
  • 101
  • 132
John Lotacs
  • 1,184
  • 4
  • 20
  • 34
  • You talk about a "compressed binary representation" then say you have a `String` that is 18 characters long ("010100111111011000") to represent a word that is 7 characters long ("program"). Are you sure you mean what you're asking? Normally you would have those bits set in X number of bytes (3 in this case). – Brian Roach Nov 26 '11 at 01:00
  • Look up 'bit shift operators': `>>`, `>>>`, `<<`. – Kevin Nov 26 '11 at 01:00
  • Brian, the original message is 56bits in size when translated to binary, the encoded message is only 18bits. Kevin,people keep telling me that, but I still can't draw the link between using those operators and being able to translate this to a byte array. – John Lotacs Nov 26 '11 at 01:03
  • @JohnLotacs - No, it's not, if you're talking about `String`s which you say you are in your question which is the source of confusion. If you have a `String` as you say, you don't have bits. You have a bunch of the characters `0` and `1` (specifically, you have a 16bit Unicode char for each, making your memory use 36 bytes before the overhead of the `String` object) - to be clear, if you have a `String` you have the textual representation of a set of bits, expressed using the characters 0 and 1. – Brian Roach Nov 26 '11 at 01:09
  • Brian, that IS the question, converting a String representation of bits to a set of bytes. – John Lotacs Nov 26 '11 at 01:11
  • @JohnLotacs - you wouldn't, ever, in relation to the things you are talking about. Why do you have a `String` ? – Brian Roach Nov 26 '11 at 01:13
  • Because it was easiest to build that encoding map with a huffman tree by doing traversals and appending 0/1 to a prefix on a StringBuffer. http://en.wikipedia.org/wiki/Huffman_coding – John Lotacs Nov 26 '11 at 01:26
  • @JohnLotacs Do you have your final solutions stil somewhere in code? I have the exact same problem, but I can't get it working – Jim Vercoelen Sep 26 '16 at 20:35

3 Answers3

6

An introduction to bit-shift operators:

First, we have the left-shift operator, x << n. This will shift all the bits in x left by n bits, filling the new bits with zero:

      1111 1111 
<< 3: 1111 1000

Next, we have the signed right-shift operator, x >> n. This shifts all the bits in x right by n, copying the sign bit into the new bits:

      1111 1111 
>> 3: 1111 1111

      1000 0000
>> 3: 1111 0000

      0111 1111 
>> 3: 0000 1111

Finally, we have the zero-fill right-shift operator, x >>> n. This shifts all bits in x right by n bits, filling the new bits with zero:

       1111 1111 
>>> 3: 0001 1111

You may also find useful the bitwise-or operator, x | y. This compares the bits in each position in x and y, setting the new number's bit on if it was on in either x or y, off otherwise:

  1010 0101
| 1010 1010
  ---------
  1010 1111

You should only need the previous operators for the problem at hand, but for the sake of completeness, here are the last two:

The bitwise-and operator, x & y sets the bits in the output to one if and only if the bit is on in both x and y:

  1010 0101
& 1010 1010
  ---------
  1010 0000

The bitwise-xor operator, x ^ y sets the output bits to one if the bit is on in one number or the other but not both:

  1010 0101
^ 1010 1010
  ---------
  0000 1111

Now, applying these to the situation at hand:

You will need to use the bit-shift operators to add and manipulate bits. Start setting bits at the right side according to their string representations and shift them over. Continue until you hit the end of a byte, and then move to the next byte. Say we want to create a byte representation of "1100 1010":

Our byte    Target
---------   --------
0000 0000
            1100 1010
0000 0001   ^
            1100 1010
0000 0011    ^
            1100 1010
0000 0110     ^
            1100 1010
0000 1100      ^
            1100 1010
0001 1001        ^
            1100 1010
0011 0010         ^
            1100 1010
0110 0101          ^
            1100 1010
1100 1010           ^

I will, of course, leave it to you to apply this to your work.

Kevin
  • 53,822
  • 15
  • 101
  • 132
  • One question, to start my byte as 0000 0001, this is the same as writing byte b = 1; ? I'm unsure, because of the signed nature of the byte, how to know what the binary representation is because I don't know what bit is representing the sign. – John Lotacs Nov 26 '11 at 02:25
  • You could do that, but for consistency you will want to start with a zero byte and then enter a `for` or `while` loop. I'll edit the example a bit to see if I can make this a bit more clear. – Kevin Nov 26 '11 at 02:29
1

Chop your String up into lengths of 8 and call Byte#parseByte. If you set the radix to 2, it will parse the String as a binary number.

Jeffrey
  • 44,417
  • 8
  • 90
  • 141
  • 1
    Exception in thread "main" java.lang.NumberFormatException: Value out of range. Value:"10000000" Radix:2 It works only on lengths of 7 unless there are leading zeros, any idea? – John Lotacs Nov 26 '11 at 02:04
  • @John Lotacs I have no idea why it's doing this, but you can can use [`Integer#parseInt`](http://tinyurl.com/7uo6b5t) and cast it to `byte` for a workaround. – Jeffrey Nov 26 '11 at 02:16
  • @jeff It's doing that because `byte` is signed, so it needs to be `-111 1111` to `+111 1111` (-128 to +127). A byte with bits of `1000 0000` is actually -128, and would have to be fed to the parser as `-1000 0000`. – Kevin Nov 26 '11 at 03:25
  • @Kevin Why can't it just take `1000 000`? Is it just a bit of laziness on the coder's part or am I missing something? – Jeffrey Nov 26 '11 at 04:07
  • The `parseByte` method parses the value of the text, not the individual bits. `1000 0000` is 128, which is out of bounds for a `byte`, which has a max of 127. It would be in range for an `unsigned byte`, but Java doesn't have unsigned types (except, I believe, `char`). – Kevin Nov 26 '11 at 04:16
  • @Kevin Ahhhh, now I see. Yeah, `char` is unsigned. – Jeffrey Nov 26 '11 at 04:50
0

I guess, you want to write these zeros and ones as binary values in a file. I so, you can iterate the string taking 8 signs everytime (String.substring() or smth) and create bytes with Byte(String) constructor. It's the easiest solution that comes to my mind for now.

If i'm not right about the problem, tell more about it please.

Jakub Matczak
  • 15,341
  • 5
  • 46
  • 64