6

I have a very large set of binary files where several thousand raw frames of video are being sequentially read and processed, and I’m now looking to optimize it as it appears to be more CPU-bound than I/O-bound.

The frames are currently being read in this manner, and I suspect this is the biggest culprit:

private byte[] frameBuf;  
BinaryReader binRead = new BinaryReader(FS);

// Initialize a new buffer of sizeof(frame)  
frameBuf = new byte[VARIABLE_BUFFER_SIZE];  
//Read sizeof(frame) bytes from the file  
frameBuf = binRead.ReadBytes(VARIABLE_BUFFER_SIZE); 

Would it make much of a difference in .NET to re-organize the I/O to avoid creating all these new byte arrays with each frame?

My understanding of .NET’s memory allocation mechanism is weak as I am coming from a pure C/C++ background. My idea is to re-write this to share a static buffer class that contains a very large shared buffer with an integer keeping track of the frame’s actual size, but I love the simplicity and readability of the current implementation and would rather keep it if the CLR already handles this in some way I am not aware of.

Any input would be much appreciated.

dtb
  • 213,145
  • 36
  • 401
  • 431
rnd
  • 95
  • 5
  • 5
    Have you run a profiler to ensure the performance hit doesn't come from other sources? Or did you just went and assumed "So that's probably it"? – David Pratte Aug 18 '10 at 19:11
  • Hi David, I ran the performance profiler on it a few times and this particular method is my most expensive one. Therefore I'm looking to see if this "new byte[]" method is an obvious performance killer in .NET. As a C programmer, this looks analogous to thousands of "malloc" statements for each buffer, which would definitely be slower than a re-used buffer. – rnd Aug 18 '10 at 20:16

2 Answers2

7

You don't need to init frameBuf if you use binRead.ReadBytes -- you'll get back a new byte array which will overwrite the one you just created. This does create a new array for each read, though.

If you want to avoid creating a bunch of byte arrays, you could use binRead.Read, which will put the bytes into an array you supply to it. If other threads are using the array, though, they'll see the contents of it change right in front of them. Be sure you can guarantee you're done with the buffer before reusing it.

cHao
  • 84,970
  • 20
  • 145
  • 172
  • Thanks for pointing this out -- I'm sure my redundant allocation is slowing this down appreciably. And the static shared array is exactly what I'm considering, but if the performance gain isn't large compared to creating the byte arrays, I'd much rather stick with the elegant solution for the very same complication you outline (shared access). – rnd Aug 18 '10 at 20:21
1

You need to be careful here. It is very easy to get completely bogus test results on code like this, results that never repro in real use. The problem is the file system cache, it will cache the data you read from a file. The trouble starts when you run your test over and over again, tweaking the code and looking for improvements.

The second, and subsequent times you run the test, the data no longer comes off the disk. It is still present in the cache, it only takes a memory-to-memory copy to get it into your program. That's very fast, a microsecond or so of overhead plus the time needed to copy. Which runs at bus-speeds, at least 5 gigabytes per second on modern machines.

Your test will now reveal that you spend a lot of time on allocating the buffer and processing the data, relative from the amount of time spent reading the data.

This will rarely repro in real use. The data won't be in the cache yet, now the sluggy disk drive needs to seek the data (many milliseconds) and it needs to be read off the disk platter (a couple of dozen megabytes per second, at best). Reading the data now takes a good three of four magnitudes of time longer. If you managed to make the processing step twice as fast, your program will actually only run 0.05% faster. Give or take.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • That's a good point, however I'm running my tests on a dataset that dwarfs my machine's memory by several gigabytes. What concerns me is the fact that similar code in my old C++ library would process this dataset in less than half the time this takes. However, I did notice the profile is warning that approx 2,826 pages per /s are being written to disk and that the app may be memory-bound. I am not explicit disposing of any of these arrays - could these be getting cached before the GC de-allocates them? – rnd Aug 18 '10 at 20:46
  • 2
    These buffers are probably large, more than 85KB. Which gets them allocated in the LOH. They'll be stuck there for a while, it takes a gen#2 collection. Nothing comes for free, reusing buffers when they are large is a good strategy in .NET as well. – Hans Passant Aug 18 '10 at 20:51
  • If you want to force the file to load from disk, clear the Windows file cache, as seen in this question: http://stackoverflow.com/q/478340/80525 – yagni Feb 05 '14 at 20:10