1

For java.io.InputStream, there are two primary read functions int read() and public int read(byte[] b, int off, int len) .

Similarly, for java.io.OutputStream there are two functions write(b) and write((byte[] b, int off, int len))

While I understand the basic difference, but on reading the description of write(b), it says that it says "The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.". Now if that's the case, then we are actually wasting the remaining 24 bits out of 32-bit instruction set which CPU would load for an integer. Instead if I use the other write ((byte[] b, int off, int len)), then I am occupying heap/stack for the size of the byte array. While I am trying to think of which one works better for high scalability, I cant ignore that write(b) wastes 24 bits(3 bytes), while on the other hand if I use the read/write(byte[] b, int off, int len), i risk higher stack sizes. So, what is the best option to choose?

In a workaround, I tried to extend InputStream and OutputStream and override read(b) and write(b) functions by providing a byte[4] to use all the 32 bits. It works just fine, but still have to see if this has any performance enhancements. Its very similar to using read/write(4, 0, int 4)

I will appreciate any help/comment on this topic.

Ashley
  • 629
  • 3
  • 6
  • 16

2 Answers2

1

Using the version that takes a byte[] doesn't push the entire byte array onto the stack. You will push a reference to the byte array onto the stack.

Unless you are only writing a single byte, it's always better to use the version that takes a byte[].

mpontillo
  • 13,559
  • 7
  • 62
  • 90
  • Thanks Mike. Dont you think that for a function with local variable, the entire array would be copied over to the stack? in case of byte array being a class variable, i understand that it would naturally be a reference since the byte[] declaration is on the heap. Pls. let me know what you think. – Ashley Sep 23 '13 at 20:44
  • No. You can find more information about this in the [Java Language Specification](http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html). Arrays are actually considered by the language to be objects, and are always passed around as such. – mpontillo Sep 23 '13 at 20:47
  • Thanks again Mike. I think with your clarification it has cleared my doubts. In this post below, @csoroiu suggests to use the byte array of the size 4K/8k (equal to the sector on the hard disk). Declaring 4K/8K bytes is a great tip. However, dont you think, it will be large on the heap to declare a chunk equivalent to the sector on hard disk? – Ashley Sep 23 '13 at 20:51
  • In my view, the more bytes you write at a time, the better. (in general) A few kilobytes of space shouldn't be an issue, depending on scale. (how many threads? how often would you allocate the space? etc) – mpontillo Sep 24 '13 at 00:07
  • Thank you. Any comment on write(b) wastes 24 bits(3 bytes)? – Ashley Sep 24 '13 at 14:14
  • No. I have no idea why they deigned it that way rather than having the function take a single `byte`. A `byte` is the smallest thing you'd typically want to write to a file, so it makes sense. I would assume that most processors the JVM is written for are 32-bit (or greater), so there isn't a huge difference between loading an 8-bit value or a 32-bit value into a CPU register. – mpontillo Sep 24 '13 at 21:12
  • See [this question](http://stackoverflow.com/questions/1407893/) for more discussion about this topic. – mpontillo Sep 24 '13 at 21:14
1

If you want to achieve speed, you might want to read.write several bytes at a time. For instance, if you want to write/read something on the disk you might want to read/write full sectors (4k or 8k bytes) at a time.

Also, doing this you minimize the number of system calls, thus the application will be faster.

Regarding the stack, in java the byte array will be on heap and only reference stored on the stack as @Mike mentioned.

Claudiu
  • 1,469
  • 13
  • 21
  • Thanks. Declaring 4K/8K bytes is a great tip. However, dont you think, it will be large on the heap to declare a chunk equivalent to the sector on hard disk? Dont you think that for a function with local variable, the entire array would be copied over to the stack? in case of byte array being a class variable, i understand that it would naturally be a reference since the byte[] declaration is on the heap. – Ashley Sep 23 '13 at 20:48
  • @Ashley, I think you have a misconception about how arrays work in Java. See [this question](http://stackoverflow.com/questions/2099695/java-array-is-stored-in-stack-or-heap). Arrays are always stored on the heap. – mpontillo Sep 23 '13 at 20:50
  • @Ashley 4k/8k on the disk is quite small compared to other stuff jvm stores in memory for an application. – Claudiu Sep 23 '13 at 20:52
  • Thanks again . I think its clear now that arrays are always stored on heap irrespective of their scope being class OR a local function. And this also now clarifies why read/write(byte[]) are better. On other hand, I am a bit confused with your answer about disk sector. 4K/8K is small for hard disk, but for an array size (when its loaded to memory), its still a large number. isnt? – Ashley Sep 23 '13 at 20:58
  • 1
    @Ashley if you have a PC with 1GB=1024MB=1048576KB of RAM for instance, then 4KB is really, really small. – Claudiu Sep 23 '13 at 21:02
  • @csoroiu. Thanks again. Actually for some reason, i kept reading 4K/8K and kept thinking 4M/8M and I think that's what confused me. Yes, 4K/8K is a fair limit. I looked into my windows server hard drive using "fsutil fsinfo ntfsinfo c:" command and found the cluster size to be 4096. Thanks again. – Ashley Sep 24 '13 at 14:11
  • @csoroiu. Any comments on write(b) wasting 24 bits(3 bytes)? – Ashley Sep 24 '13 at 14:13
  • @Ashley When dealing with IO writing less than a sector is same as writing a sector. It is true that 3 bytes are wasted, but it might be useful to have them wasted from some perspective, but 3 bytes are not such a big deal. If the input type would have been byte, then the JIT might have introduced some NOP operands to ensure that machine code is aligned and have optimal execution. Sometimes is better to waste bytes in favor of performance. Hope this helps and fits your needs http://stackoverflow.com/questions/5217855/machine-code-alignment – Claudiu Sep 24 '13 at 17:15