0

In order to send a chunk of bits from a 4 words String, I'm doing getting the byte array from the String and calculating the bit string.

StringBuilder binaryStr = new StringBuilder();

byte[] bytesFromStr = str.getBytes("UTF-8");
for (int i = 0, l = bytesFromStr.length; i < l; i++) {
    binaryStr.append(Integer.toBinaryString(bytesFromStr[i]));
}

String result = binaryStr.toString();

The problem appears when I want to do the reverse operation: converting a bit string to a Java String encoded using UTF-8.

Please, Is there someone that can explain me the best way to do that?

Thanks in advance!

osanchezmon
  • 544
  • 1
  • 4
  • 18
  • I think this is a duplicate of: http://stackoverflow.com/questions/5499924/convert-java-string-to-byte-array, at the very least I think it will help. – Gavin Jul 09 '16 at 17:23
  • 1
    It's impossible to reverse that operation. You can't possibly know if 100011010100110101100100 is the representation of 3 bytes, or 4, or 5, or... What are you trying to achieve? Why are you doing that? – JB Nizet Jul 09 '16 at 17:27
  • 2
    If you have string `"1a"` then it is build from characters `1` and `a` which are placed in Unicode Table at positions `49`, `97`. In binary form they should be represented as `0110001` `1100001`. But result of `Integer.toBinaryString(49)` is `110001` not `0110001` (leading `0` is ignored). So as JB Nizet pointed out, it is impossible to detect if `111` represents `1` `1` `1` or `11` `1` or `1` `11` or `111`. Anyway what you are doing here looks like [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) – Pshemo Jul 09 '16 at 17:33
  • If I have 4 words encoded with UFT-8 means that I have 4 bytes, if I'm not wrong. In that case I think I can reverse the operation. That is for a PoC about steganography and data exfiltration. – osanchezmon Jul 09 '16 at 17:36
  • "*If I have 4 words encoded with UFT-8 means that I have 4 bytes*" what makes you think so? Can you point us to some resource which gave you that idea? What you are saying can be interpreted as "utf-8 writes one word on one byte" but try to think about how many words are out there, and how many numbers byte can hold. – Pshemo Jul 09 '16 at 17:37
  • @Pshemo Sorry. I am talking about coding a single ASCII char using a byte and thats enough for this test. – osanchezmon Jul 09 '16 at 17:43
  • OK, but your intention is still unclear. What is the point in converting one String into another String (which would contain bit representation)? Second string will be 8 times bigger than first one (since for each character you would get eight new ones build from `'0'` and `'1'` characters - not bits). If you are sending a string why not send original one instead of that new one? – Pshemo Jul 09 '16 at 17:50
  • @Pshemo my intention is to replace some unused (or non-significant) bits in a bit-level numeric representation for those bits. That is the point of my test in the steganography field; hide data and recover it. – osanchezmon Jul 09 '16 at 18:03

2 Answers2

2

TL;DR Don't use toBinaryString(). See solution at the end.


Your problem is that Integer.toBinaryString() doesn't return leading zeroes, e.g.

System.out.println(Integer.toBinaryString(1));   // prints: 1
System.out.println(Integer.toBinaryString(10));  // prints: 1010
System.out.println(Integer.toBinaryString(100)); // prints: 1100100

For your purpose, you want to always get 8 bits for each byte.

You also need to prevent negative values from causing errors, e.g.

System.out.println(Integer.toBinaryString((byte)129)); // prints: 11111111111111111111111110000001

Easiest way to accomplish that is like this:

Integer.toBinaryString((b & 0xFF) | 0x100).substring(1)

First, it coerces the byte b to int, then retains only lower 8 bits, and finally sets the 9th bit, e.g. 129 (decimal) becomes 1 1000 0001 (binary, spaces added for clarity). It then excludes that 9th bit, in effect ensuring that leading zeroes are in place.

It's better to have that as a helper method:

private static String toBinary(byte b) {
    return Integer.toBinaryString((b & 0xFF) | 0x100).substring(1);
}

In which case your code becomes:

StringBuilder binaryStr = new StringBuilder();
for (byte b : str.getBytes("UTF-8"))
    binaryStr.append(toBinary(b));
String result = binaryStr.toString();

E.g. if str = "Hello World", you get:

0100100001100101011011000110110001101111001000000101011101101111011100100110110001100100

You could of course just do it yourself, without resorting to toBinaryString():

StringBuilder binaryStr = new StringBuilder();
for (byte b : str.getBytes("UTF-8"))
    for (int i = 7; i >= 0; i--)
        binaryStr.append((b >> i) & 1);
String result = binaryStr.toString();

That will probably run faster too.

Andreas
  • 154,647
  • 11
  • 152
  • 247
  • Thank's @Andreas. I will do some test with your implementation avoiding 'toBinaryString()' and trying to recover the information. – osanchezmon Jul 09 '16 at 17:56
0

Thanks @Andreas for your code. I test using your function and "decoding" again to UTF-8 using this:

StringBuilder revealStr = new StringBuilder();
for (int i = 0; i < result.length(); i += 8) {
    revealStr.append((char) Integer.parseUnsignedInt(result.substring(i, i + 8), 2));
} 

Thanks for all folks to help me.

osanchezmon
  • 544
  • 1
  • 4
  • 18