4

There are many java standard and 3rd party libraries that in their public API, there are methods for writing to or reading from Stream. One example is javax.imageio.ImageIO.write() that takes OutputStream to write the content of a processed image to it. Another example is iText pdf processing library that takes OutputStream to write the resulting pdf to it. Third example is AmazonS3 Java API, which takes InputStream so that will read it and create file in thir S3 storage.

The problem araises when you want to to combine two of these. For example, I have an image as BufferedImage for which i have to use ImageIO.write to push the result in OutputStream. But there is no direct way to push it to Amazon S3, as S3 requires InputStream.
There are few ways to work this out, but subject of this question is usage of ByteArrayOutputStream.

The idea behind ByteArrayOutputStream is to use an intermidiate byte array wrapped in Input/Output Stream so that the guy that wants to write to output stream will write to the array and the guy that wants to read, will read the array.

My wondering is why ByteArrayOutputStream does not allow any access to the byte array without copying it, for example, to provide an InputStream that has direct access to it. The only way to access it is to call toByteArray(), that will make a copy of the internal array (the standard one). Which means, in my image example, i will have three copies of the image in the memory:

  • First is the actual BufferedImage,
  • second is the internal array of the OutputStream and
  • third is the copy produced by toByteArray() so I can create the InputStream.

How this design is justified?

  • Hiding implementation? Just provide getInputStream(), and the implementation stays hidden.
  • Multi-threading? ByteArrayOutputStream is not suited for access by multiple threads anyway, so this can not be.

Moreover, there is second flavor of ByteArrayOutputStream, provided by Apache's commons-io library (which has a different internal implementation). But both have exactly the same public interface that does not provide way to access the byte array without copying it.

Op De Cirkel
  • 28,647
  • 6
  • 40
  • 53
  • Without commenting on the actual question - you don't actually need to have three copies of it around at once - as you describe it, you can discard the `BufferedImage` before you call `toByteArray()` to build the third copy. – Jonathan Rupp Jun 13 '11 at 00:19
  • this is not a discussion forum, and you are trying to turn this Question into a debate. Alternatively, if you are trying to get something done about this, you are talking to the wrong people. You might have more luck if you put together a concrete proposal >>WITH WORKING CODE<< and lots of motivating examples, and submitted it to the Apache Commons IO folks. – Stephen C Jun 13 '11 at 01:12

3 Answers3

6

My wondering is why ByteArrayOutputStream does not allow any access to the byte array without coping it, for example, to provide an InputStream that has direct access to it.

I can think of four reasons:

  • The current implementation uses a single byte array, but it could also be implemented as a linked list of byte arrays, deferring the creation of the final array until the application asks for it. If the application could see the actual byte buffer, it would have to be a single array.

  • Contrary to your understanding ByteArrayOutputStream is thread safe, and is suitable for use in multi-threaded applications. But if direct access was provided to the byte array, it is difficult to see how that could be synchronized without creating other problems.

  • The API would need to be more complicated because the application also needs to know where the current buffer high water mark is, and whether the byte array is (still) the live byte array. (The ByteArrayOutputStream implementation occasionally needs to reallocate the byte array ... and that will leave the application holding a reference to an array that is no longer the array.)

  • When you expose the byte array, you allow an application to modify the contents of the array, which could be problematic.


How this design is justified?

The design is tailored for simpler use-cases than yours. The Java SE class libraries don't aim to support all possible use-cases. But they don't prevent you (or a 3rd party library) from providing other stream classes for other use-cases.


The bottom line is that the Sun designers decided NOT to expose the byte array for ByteArrayOutputStream, and (IMO) you are unlikely to change their minds.

(And if you want to try, this is not the right place to do it.

  • Try submitting an RFE via the Bugs database.
  • Or develop an patch that adds the functionality and submit it to the OpenJDK team via the relevant channels. You would increase your chances if you included comprehensive unit tests and documentation.)

You might have more success convincing the Apache Commons IO developers of the rightness of your arguments, provided that you can come up with an API design that isn't too dangerous.

Alternatively, there's nothing stopping you from just implementing your own special purpose version that exposes its internal data structures. The code is GPL'ed so you can copy it ... subject to the normal GPL rules about code distribution.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • >> _The current implementation uses a single byte array_ .... That doesn't prevent you to have `InputStream` that knows all the internals and does exactly what is needed. BTW, commons-io has different implementation and `private` method that actually provides `InputStream` – Op De Cirkel Jun 13 '11 at 00:33
  • >> _is thread safe_: i am sorry i was not clear on this. I didn't want to say the is not _thread safe_. It is, you can not corrupt the internal state. But how useful is to write to it from multiple threads. It is similar with _StringBuffer_ that being thread safe i almost useless. – Op De Cirkel Jun 13 '11 at 00:56
  • >> _The API would need to be more complicated..._ I guess you want to say the implementation would be more complicated, but that is not the concern. – Op De Cirkel Jun 13 '11 at 00:58
  • >> _When you expose the byte array..._ I did not ask to expose the byte array. But my idea was to have something like `getInputStream()` that will actually have _read-only_ access. – Op De Cirkel Jun 13 '11 at 01:00
  • @Stephen Most/all of the reasons you've mentioned talk about the _internal implementation_ and not about the interface. – Op De Cirkel Jun 13 '11 at 01:04
  • 1
    @Op De Cirkel - I'm not going to debate this. SO is not a discussion forum. – Stephen C Jun 13 '11 at 01:12
2

I think that the behavior you are looking for is a Pipe. A ByteArrayOutputStream is just an OutputStream, not an input/output stream. It wasn't designed for what you have in mind.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • I know that pipes are way to approach the problem, and it has it's own pros and cons. But my questions is about ByteArrayOutputStream. _>> It wasn't designed for what you have in mind._ Whatever is the purpose, you always have to duplicate the array? It simply doesn't feel right. – Op De Cirkel Jun 13 '11 at 00:29
  • 1
    It's an easy way to capture byte-oriented output that would otherwise have to go to a file, socket, or some other hard-to-recover destination. The purpose of forcing a copy of the contents is to insulate the result from the effects of further writes (kind of the opposite of what you want, I gather). – Ted Hopp Jun 13 '11 at 00:41
2

Luckily, the internal array is protected, so you can subclass it, and wrap a ByteArrayInputStream around it, without any copying.

irreputable
  • 44,725
  • 9
  • 65
  • 93
  • 2
    That is another bad thing, as the class is locked now to that implementation/internals as it exposed it. – Op De Cirkel Jun 13 '11 at 01:07
  • 1
    This will fail if the internal array needs to be reallocated due to continued output into the `ByteArrayOutoutStream` after the internal array has been wrapped in a `ByteArrayInputStream`. The reallocation cannot be monitored because it takes place in a `private` method. – Ted Hopp Jul 24 '13 at 00:46