6

I need to call the ReadFile function of the Windows API:

BOOL WINAPI ReadFile(
  _In_        HANDLE       hFile,
  _Out_       LPVOID       lpBuffer,
  _In_        DWORD        nNumberOfBytesToRead,
  _Out_opt_   LPDWORD      lpNumberOfBytesRead,
  _Inout_opt_ LPOVERLAPPED lpOverlapped
);

The argument I'm interested in is the 3rd one:

nNumberOfBytesToRead [in]

The maximum number of bytes to be read.

I'm not interested so much in the "magic number" to put there but the process a seasoned programmer takes to determine the number to put there, preferably in numbered steps.

Also keep in mind I am writing my program in assembler so I'm more interested in the thought process from that perspective.


  • Related: http://stackoverflow.com/questions/236861/how-do-you-determine-the-ideal-buffer-size-when-using-fileinputstream I'd recommend trying multiples of 4096 and looking at the benchmark results. In general too small buffers will have a severe impact on performance, but too large buffers are only marginally less efficient. Cache effects need to be taken into consideration for example. – tux3 Jan 28 '16 at 16:03
  • This question might be a little too broad. Are you loading the entire file into memory so you can then have fast random access or do you just want to process it once sequentially? What governs how you'll make speed/space trade-offs? How did you decide this was the right API to use in the first place? Perhaps you should use a wrapper that provides caching and/or normalization. Are you going to use asynchronous i/o? – Adrian McCarthy Jan 28 '16 at 19:10

2 Answers2

3

This requires plenty of insight into both Windows and your hardware. But, in general, here are some possible directions:

  • Is the write buffered or unbuffered? If unbuffered, then you may not even be able to choose the size, but have to follow strict rules for both the size and the alignment of the buffer.
  • In general, you'd want to let the operating system handle as much of the work as possible, because it knows a lot more about the storage device itself and its various users than you do in userspace. So you might want to fetch the whole thing at once, if possible (see points below).
  • If it turns out that that isn't good enough, you may try to outsmart it by playing around with various sizes, to account for cases where you might be able to use current buffers which the OS, for some reason, wouldn't always make use of for different requests.
  • Otherwise, you might play around with sizes ranging anywhere between the disk sector size and multiples of the page size, as these are most likely to already be cached somewhere, and also to map directly to actual hardware requests.
  • Other than performance, there's the question of how much you can afford to store in your process's memory at any given time.
  • There's also the question of sending large requests which might block other processes from getting the chance to get in there and get some data in between—if the OS doesn't already take care of that somehow.
  • There's also the possibility that by requesting too-large chunks the OS might defer your request till other processes get their humble ones served. On the flip side, if it's to intersecting addresses, it might actually serve yours first in order to then serve the other ones from the cache.

In general, you'd probably want to play around until you get something that works well enough.

Yam Marcovic
  • 7,953
  • 1
  • 28
  • 38
  • "sizes ranging anywhere between the disk sector size and the page size" <- I doubt this will be the limit. Reading two or more pages at a time means less syscall overhead and shouldn't be slower until cache effects kick in. – tux3 Jan 28 '16 at 16:13
  • @tux Correct, I'll fix. Except I'd also add that the actual overhead in syscalls here it probably heavily outweighed by actual access to storage devices. – Yam Marcovic Jan 28 '16 at 16:14
  • Agreed, at this point this is just reaching for the last extra percents. – tux3 Jan 28 '16 at 16:15
1

That paremeter is there only to protect you from buffer overflow, so you of course must enter size of the buffer you allocated for this purpose. Other than that you should only read as many bytes as you are interested in this exact time. Modern OS will always use pagecache and any following access to the file will be as fast as accessing RAM. You can also force the OS to cache the file beforehand if you need it whole.
Edit: My experience is against what Yam Marcovic and others recommend. Caching files and chunking reads to ideal sizes is exactly the thing OS is there to do. Do not presume to outsmart it and read just what you need.

user1316208
  • 667
  • 1
  • 5
  • 12