0

I feel quite confused when looking at java's API regarding string encoding conversion.

The API i found involves converting a string to bytes array or a bytes array to a string. But none of the API that takes a byte array and returns a String takes both the input and the output encoding parameters

Given that i can't find any documentation around the issue, i want to assume that when constructing a string like so 'new String(buffer, encoding)' - the encoding parameter represents the input encoding and the output encoding is constant and set to UTF-16.

Am i right to assume that?

user1708860
  • 1,683
  • 13
  • 32
  • 1
    see http://stackoverflow.com/questions/19066042/convert-windows-1257-to-iso-8859-13-java-charset?rq=1 –  Jun 28 '16 at 11:35
  • A String is a string. Why do you assume that it needs an output encoding? You only need the later when you want to create something **out** of a string. In other words: a string object is responsible for "being a sequence of characters". Turning that sequence into something else (using an "output encoding" is a different responsibility). Or, different example: would you expect that a string object knows about the resolution of your screen; so that it can decide how to print itself on the screen? Nope - a "screen printer" takes strings and somehow puts them a screen ... – GhostCat Jun 28 '16 at 11:38
  • As per RC's answer you can use different encodings for input and for output. The code fragment you wrote correctly specifies an input encoding. Only worry about the output endoing when you write the string... – vikingsteve Jun 28 '16 at 11:39
  • 1
    Your confusion comes from a lack of understanding what a [character encoding](https://en.wikipedia.org/wiki/Character_encoding) is (<= click the link). A character encoding defines how text characters are represented in bytes. A string consists of characters, and doesn't have an encoding by itself. The encoding only comes into play when you need to convert the characters to (or from) bytes, for example when you need to save them in a file or transport them over the network. – Jesper Jun 28 '16 at 11:43
  • @RC, thanks! This seems to be exactly whay i needed – user1708860 Jun 28 '16 at 14:16
  • @Jesper, this isn't always true. Encoding could have effects in your business logic too if you are not carefull. For example you may be reading some data from a file so you can later search it. Crossing data containing several encodings will not work well most of the time for non English languages – user1708860 Jun 28 '16 at 14:24
  • @Jägermeister, because I'm writing it to a file and/or presenting it to my user for example. If i convert from two encodings we nedd to know source and destination encoding – user1708860 Jun 28 '16 at 14:29

2 Answers2

1

First, if not working with UTF-16 only don't use char either String but byte[] in order to avoid encoding problems.


You can create a specific Charset to read / write your byte[]:

Charset iso88591 = Charset.forName("ISO-8859-8");
Charset utf16 = Charset.forName("UTF-16");

After just use ByteBuffer to decode:

byte[]  inputData = //your byte array
ByteBuffer inputBuffer = ByteBuffer.wrap(inputData);
CharBuffer data = iso88591.decode(inputBuffer);

Or encode:

ByteBuffer outputBuffer = utf16.encode(data);
byte[] outputData = outputBuffer.array();

EXTRA: If you have an encoded file you can simply:

PrintWriter out = new PrintWriter(file, "ISO-8859-8");
Jordi Castilla
  • 26,609
  • 8
  • 70
  • 109
0

Java's String does not know about encoding-issues, its internal representation is UTF-16, period.

The encoding only matters when converting the String to bytes or vice versa, e.g. when

  • creating a String from a byte[]
  • reading Strings via an InputStreamReader
  • converting a String to a byte[]
  • writing Strings via an OutputStreamWriter

...and it can be specified in all these cases.

piet.t
  • 11,718
  • 21
  • 43
  • 52