6

I have a pretty interesting topic - at least for me. Given a ByteArrayOutputStream with bytes for example in UTF-8, I need a function that can "translate" those bytes into another - new - ByteArrayOutputStream in for example UTF-16, or ASCII or you name it. My naive approach would have been to use a an InputStreamReader and give in the the desired encoding, but that didn't work because that'll read into a char[] and I can only write byte[] to the new BAOS.

public byte[] convertStream(Charset encoding) {
    ByteArrayInputStream original = new ByteArrayInputStream(raw.toByteArray());
    InputStreamReader contentReader = new InputStreamReader(original, encoding);
    ByteArrayOutputStream converted = new ByteArrayOutputStream();

    int readCount;
    char[] buffer = new char[4096];
    while ((readCount = contentReader.read(buffer, 0, buffer.length)) != -1)
        converted.write(buffer, 0, readCount);

    return converted.toByteArray();
}

Now, this obviously doesn't work and I'm looking for a way to make this scenario possible, without building a String out of the byte[].

@Edit: Since it seems rather hard to read the obvious things. 1) raw: ByteArrayOutputStream containing bytes of a BINARY object sent to us from clients. The bytes usually come in UTF-8 as a part of a HTTP Message. 2) The goal here is to send this BINARY data forward to an internal System that's not flexible - well this is an internal System - and it accepts such attachments in UTF-16. I don't know why don't even ask, it does so.

So to justify my question: Is there a way to convert a byte array from Charset A to Charset B or encoding of your choise. Once again Building a String is NOT what I'm after.

Thank you and hope that clears up questionable parts :).

Display name
  • 637
  • 1
  • 7
  • 16
  • What is `raw`? You've only given us part of the information. I'd expect to just convert the bytes to a string, and then convert back from a string to a byte array. No need to use streams at all. – Jon Skeet Dec 22 '15 at 10:32
  • Well, raw is obviously a ByteArrayOutputStream containing the bytes in whatever encoding that was used by our client of a binary data. We have to transfer this data to our System in utf-8 formát so we need to convert the whatever to utf-8 or whatever. I hope that clears it up. Building a string is out of question right now. – Display name Dec 22 '15 at 10:35
  • 2
    *Why* is building a string out of the question? If the most obvious approach is inappropriate, you need to explain *why* that's the case. And the benefit of a short but complete example is that what you consider "obvious" is spelled out in the code. Far too often I've made assumptions that seem "obvious" to me, but turn out not to be... and when you're now adding restrictions as to what is feasible and what isn't, that adds to the confusion. – Jon Skeet Dec 22 '15 at 10:38
  • if it concerns charset, see this: http://stackoverflow.com/questions/229015/encoding-conversion-in-java – guillaume girod-vitouchkina Dec 22 '15 at 10:41
  • Well maybe, but just maybe if you read the question and see that I have a ByteArrayOutputStream containing data obviously bytes in for example UTF-16. Then you see in the code this magic raw... now what might that be? Seriously... And I make a question stating WITHOUT building a string... Obviously I know I can just new String(raw.toByteArray(()).getBytes(encoding). Maybe but just maybe I really want to know a solution to my actual question and not something that doesn't fit my needs. To put things short I'll update the question just for you! – Display name Dec 22 '15 at 10:42
  • 2
    But the answer building a string up *does* answer your original question. There was nothing in that original question to explain why you wouldn't want to do that. You still haven't said *why* you refuse to create a string. And being rude to people trying to help you is a really, really bad idea. – Jon Skeet Dec 22 '15 at 10:47
  • It's not about being rude. I clearly stated that I wish not to build a String and look for a solution around what I've showed. So giving a solution that builds a String when I clearly stated I don't need such that? How would you react if you've given a task to your subordinate and he'd bring you an apple because hey he knows you want an apple. – Display name Dec 22 '15 at 15:01
  • In what way is "Since it seems rather hard to read the obvious things" not rude? And again, going via a string is the obvious way to accomplish the task, so it's natural to ask whether there's any good reason not to do that. You have *still* not provided any justification. – Jon Skeet Dec 22 '15 at 16:38
  • In particular, bear in mind that the main purpose of Stack Overflow is to provide a repository for questions and answers for future readers. Without providing any reason *why* you have this strange requirement of not using a `String` as an intermediate representation, it won't be clear to future readers whether they should use that option or not. The other reason to provide justification for unusual requirements is that such requirements are often the result of misinformation, and it's often better to investigate whether a requirement is valid or not than satisfy an unnecessary one. – Jon Skeet Dec 22 '15 at 18:28

1 Answers1

16

As mentioned in comments, I'd just convert to a string:

String text = new String(raw.toByteArray(), encoding);
byte[] utf8 = text.getBytes(StandardCharsets.UTF_8);

However, if that's not feasible (for some unspecified reason...) what you've got now is nearly there - you just need to add an OutputStreamWriter into the mix:

// Nothing here should throw IOException in reality - work out what you want to do.
public byte[] convertStream(Charset encoding) throws IOException {       
    ByteArrayInputStream original = new ByteArrayInputStream(raw.toByteArray());
    InputStreamReader contentReader = new InputStreamReader(original, encoding);

    int readCount;
    char[] buffer = new char[4096];
    try (ByteArrayOutputStream converted = new ByteArrayOutputStream()) {
        try (Writer writer = new OutputStreamWriter(converted, StandardCharsets.UTF_8)) {
            while ((readCount = contentReader.read(buffer, 0, buffer.length)) != -1) {
                writer.write(buffer, 0, readCount);
            }
        }
        return converted.toByteArray();
    }
}

Note that you're still creating an extra temporary copy of the data in memory, admittedly in UTF-8 rather than UTF-16... but fundamentally this is hardly any more efficient than creating a string.

If memory efficiency is a particular concern, you could perform multiple passes in order to work out how many bytes will be required, create a byte array of the write length, and then adjust the code to write straight into that byte array.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194