9

Possible Duplicate:
How do you determine the ideal buffer size when using FileInputStream?

Is fread($file, 8192) any better or safer than fread($file, 10000)? Why do most examples use a power of two?

Community
  • 1
  • 1
matthewdaniel
  • 1,446
  • 12
  • 16

3 Answers3

7

Please see this great accepted answer to this question: How do you determine the ideal buffer size when using FileInputStream?.

Most file systems are configured to use block sizes of 4096 or 8192. In theory, if you configure your buffer size so you are reading a few bytes more than the disk block, the operations with the file system can be extremely inefficient (i.e. if you configured your buffer to read 4100 bytes at a time, each read would require 2 block reads by the file system). If the blocks are already in cache, then you wind up paying the price of RAM -> L3/L2 cache latency. If you are unlucky and the blocks are not in cache yet, the you pay the price of the disk->RAM latency as well.

This is why you see most buffers sized as a power of 2, and generally larger than (or equal to) the disk block size. This means that one of your stream reads could result in multiple disk block reads - but those reads will always use a full block - no wasted reads.

Although the question is Java-related, the answer is not. Moreover it's pretty much language-agnostic. That answer covers all factors I'm aware of regarding buffer sizes.

Community
  • 1
  • 1
andr
  • 15,970
  • 10
  • 45
  • 59
  • I've taken the liberty to include what I believed to be the relevant part of the answer, feel free to modify of course. – Wesley Murch Dec 04 '12 at 17:21
  • Thanks @WesleyMurch - it's more elegant and easier to read with the quote. I'll keep that in mind next time. – andr Dec 04 '12 at 17:34
2

Either because:

  • when picking arbitrary numbers programmers like to pick powers of two, or
  • in some sort of premature optimization, the programmer thinks that reading in multiples of block size will have some sort of speed boost.
Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • Or maybe you can explain this from the manual: *"if the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made; depending on the previously buffered data, the size of the returned data may be larger than the chunk size."* – Wesley Murch Dec 04 '12 at 16:59
  • It *is* actually more efficient to have data aligned on word boundaries, so I'd *like to think* this is an artifact from when programmers actually took the time to hand optimise their code. – Leigh Dec 04 '12 at 17:00
  • I've downvoted since the answer appears to be *only* vague speculation without any backing or reference. Even the second statement is misleading. *Is* it a legit (albeit, tiny) optimization or not? – Wesley Murch Dec 04 '12 at 17:02
  • Or rather, when programmers actually *needed* to hand optimize their code in all but the rarest cases. – Andy Lester Dec 04 '12 at 17:03
2

Operating systems allocate memory in pages, (typically 4k - but sometimes 8k).

In this case using a buffer size that is a multiple of 8192 bytes makes for more efficient memory allocation (since it is also caters for multiples of 4096 bytes).

If you request 13k of memory, 16k will be used anyway, so why not ask for 16k to start with.

CPU instruction sets are also optimised to work with data that is aligned to certain boundaries, be it 32, 64, or 128 bits. Working with data that is aligned to 3 bits, or 5 bits or something odd adds additional processing overhead.

This is not specific to PHP, which uses the Zend Memory Manager on top of the OS' own memory management, and probably allocates larger blocks of memory up-front and takes the concern of memory management away from the user.

Leigh
  • 12,859
  • 3
  • 39
  • 60