1

i am doing a project which requires me to convert UTF-8 string stored in a windows text file into a continuous binary string and store it in a windows text file. and then read this binary string and convert it back to the original UTF-8 String and store it in a text file. i converted the UTF-8 string to Binarystring but have no idea how to reverse the process.

here's my program to convert UTF-8 String to Binary strings.

package filetobits;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;

public class FileToBits {

    public static void main(String[] args) throws IOException, FileNotFoundException {

        FileReader inputStream = new FileReader("C:\\FileTesting\\UTF8.txt");
        FileWriter outputStream = new FileWriter("C:\\FileTesting\\BinaryStrings.txt");

        int c;

        while ((c = inputStream.read()) != -1) {

            outputStream.write(Integer.toBinaryString(c));
            outputStream.write(System.lineSeparator());
        }
        inputStream.close();
        outputStream.close();
    }
}

here's my input(16 characters):

¼¹¨'I.p

here's my output:

1111111111111101 10000 1111111111111101 11111 1111111111111101 100111 1001001 101110 1110000 111100 1111111111111101 1100001 101100 101001 1111111111111101 1111111111111101

i need help converting these binary strings back to a single UTF-8 String and store it in a text file.

i achieved what i want with the following code:

    String str = "";
    FileReader inputStream = new FileReader("C:\\FileTesting\\Encrypted.txt");
    FileWriter outputStream = new FileWriter("C:\\FileTesting\\EncryptedBin.txt");
int c;
while ((c  = inputStream.read()) != -1) {
String s = String.format("%16s", Integer.toBinaryString(c)).replace(' ', '0');
for (int i = 0; i < s.length() / 16; i++) {
int a = Integer.parseInt(s.substring(16 * i, (i + 1) * 16), 2);
str += (char) (a);
    }
   }

But the problem is i cant add extra 0's to make every binary string to a length of 16, because i need to store this data in a image(for my image steganography project). so the shorter the binary string the better.

i need to get the same output produced by the above code but without converting every binary string to a length of 16.

PS: i am kinda lost when it comes to character encodings. is storing UTF-8 characters in a windows txt file convert them to ANSI or something?

NoobScript
  • 47
  • 1
  • 8
  • I think this is what you needed https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes() and in reverse process you can simply create a new String instance from bytes. – eatSleepCode Jan 28 '17 at 12:43
  • Possible duplicate of [How to parse String as Binary and convert it to UTF-8 equivalent in Java?](http://stackoverflow.com/questions/35264002/how-to-parse-string-as-binary-and-convert-it-to-utf-8-equivalent-in-java) – Sabir Khan Jan 28 '17 at 12:45
  • @SabirKhan i'm sorry but it's not a duplicate, i can't afford to write extra bits to make every character 16 bits binary value. – NoobScript Jan 28 '17 at 12:51
  • Exactly where did you see that UTF-8 requires 16 bits per character? The number of bytes required per character varies and depending what these byte values are, the decoder knows how many bytes to read to recreate a character. By the way, are you ACTUALLY required to write the binary string of your UTF-8 string to a file, or is this an intermediate step of your own to make sure the conversion is correct before you embed the bits to your image? Because in that case, you don't even need a bit string, just the bytes array. – Reti43 Jan 28 '17 at 13:04
  • @Reti43 you're right. i don't need to convert it into binary string. i just don't know how to embed these bytes directly into LSB's of pixels. so i thought it would be easier to first convert it into a string of 0's and 1's then embed bit by bit. – NoobScript Jan 28 '17 at 13:09
  • @Reti43 thanx, i did not know decoder automatically recreates characters from multiple bytes. sorry for wasting your time. – NoobScript Jan 28 '17 at 13:16
  • If you're embedding your bits in sequential pixel order one at a time, you can do something as simple as [this](http://stackoverflow.com/a/26616856/2243104). Or you could implement an [iterator](http://stackoverflow.com/a/1034888/2243104), which contains the same idea and you can generalise it so that it calls `n` bits every time you call `next()`. And assuming the pixel value is between 0 and 255, you embed it like so `pixel = pixel & mask | myIterator.next();`, where `mask` zeroes out the last `n` bits of `pixel`. For example, 0xfe is (11111110 in binary) zeroes out the last bit. – Reti43 Jan 28 '17 at 13:36

1 Answers1

0

a byte has 8 bits. in a first step, ignore the UTF-8 issue, just fill a byte[] with the data from your binary string.

When you have a byte[] data, you can use new String(d) to create an UTF-8 String (Java Strings are UTF-8 be default).

Christoph Bimminger
  • 1,006
  • 7
  • 25
  • btw - when writing a file in java, it's by default UTF-8. You can specify alternative output by using a charset when opening the FileOutputStream or whatelse (Charset.forName("iso-8859-1") or sth like this) – Christoph Bimminger Jan 28 '17 at 12:46
  • but doesn't the UTF-8 String contain 16bit charcters? how can i store it in a byte(8-bits) array? – NoobScript Jan 28 '17 at 12:54
  • UTF-8 may store characters as 1-byte, 2-byte, and even up to 4-byte sequences. The 1-byte charset is compatible with 7bit ASCII. There are differences only in the 8bit ASCII (so ASCII 128..256). When storing a 1 byte UTF-8 char in a byte[] it will use only one element of the array. When storing a multibyte char in a byte[] you will see that several bytes of the byte[] are used to store this single character. – Christoph Bimminger Oct 21 '17 at 19:29