Speed Tradeoff: Frequently Reading from file vs storing it using dynamic memory

Question

I am writing a C program which involves reading a image file and reading each pixel of image just once. So should i read file once using fread() and store it in some dynamic variable(heap variable) or frequeently use fread() for each pixel?? Image will be of size 1000*1000 to 5000*5000. I will be extending the same program in MPI and CUDA. I would appreciate any other suggestions. Thank you.

It will definitely be slow reading a pixel at a time, but even that might not be noticeable if your computation is already very expensive. However, this doesn't mean you need to read the whole file into memory. You could easily use memory-mapped files (`mmap`) or just read reasonable-size chunks at a time (4-16k should be a plenty to compensate for the overhead cost of reading). — R.. GitHub STOP HELPING ICE, Mar 30 '11 at 18:26

thkala · Answer 1 · 2012-03-08T19:48:37.240

Even a 12-bit colour ARGB image would need about 150 MB for a 5,000 * 5,000 pixel resolution, which is well within the capabilities of all current PCs and even many GPU cards. If you have that kind of memory available, you should read it once in a dynamically allocated array, or something along those lines. It would allow you to read the whole image in big I/O blocks, which is faster, and use direct memory operations (img[1234][4321][RED] = 34), rather than complicate your code with I/O functions.

If you do not have that kind of memory available, look at mmap() or whatever equivalent exists for your OS to map the file into virtual memory. You still have the advantage of using direct memory operations, without necessarily loading the whole thing in memory, although it would be computationally more expensive.

That said, modern OS perform extensive caching and prefetching of data, therefore using fread() may not be that much slower. Moreover, on current Linux systems with glibc-2.3 or later, it is optionally possible to use mmap() for file access, even when the application performs I/O with standard stdio functions.

score 1 · Answer 2 · answered Mar 30 '11 at 18:14

It depends. You should try and estimate the amount of memory on most computers that will run your software. It also depends on how speed critical your code is.

Obviously, one approach is faster while the other uses much more memory. In general, you are probably okay loading it into memory on most modern computers and that's easier. But you have to weigh the pros and cons in your particular case.

score 1 · Answer 3 · answered Mar 30 '11 at 18:20

Generally I've found the quickest way to deal with files is to try to read the whole thing into memory in one big I/O, and deal with it out of memory from then on in. It often makes the code simpler too.

You do of course have to worry about files that might not fit in any available contiguous memory chunk. If you handle that properly (rather than just bail) the code becomes much more complex. As a certified lazy programmer, I prefer to just bail if I can get away with it. :-)

score 1 · Answer 4 · edited May 23 '17 at 10:33

Here's another question that may help you make a decision: How exactly does fopen(), fclose() work?

If you're looking for speed, it would best to load the entire file at once in to memory and manipulate it there. That way you're avoiding unecesary calls to your hard disk driver to provide the data. When you start talking about providing 25,000,000 different 4-byte chunks (assuming 32-bit RGBA) for a 5k image, you're looking at potentially a lot of seeking, reading, and waiting.

This is one of the classic memory vs speed tradeoff's. If your customers will have enough memory, then it would be best to load all the data in to memory then perform your transformations.

Otherwise try to load enough data at a time (paging) so that its fast and fits the memory profile you're targetting.

score 1 · Answer 5 · answered Mar 30 '11 at 20:56

Depends on which kind of algorithm you need to process. An image of 5000 * 5000 is around 95 Mb. Not big deal.

On the Gpu side you can async upload to the GPU memory in block of around 4MB-16MB to saturate the bandwidth

#pseudocode:

 for chunk in fread(4096MB):
     gpu.uploadAsync (chunk) # will not block
 gpu.execute() #wait that all the previous memory transfers are completed.

You have to use Pinned Memory on cuda, and I think if you memory map the file copy the blocks will be even faster.

As usual profile your application for the best tuning.

score 0 · Answer 6 · answered Mar 30 '11 at 18:13

0

Look at using mmap() linux or mapviewoffile() under windows.

answered Mar 30 '11 at 18:13

Angelom

2,413
1
15
8

score 0 · Answer 7 · answered Mar 30 '11 at 18:14

0

Storing it in memory will definately be faster. If you read small chuncks from a hard drive every time, you always incur delays due to minimum access times, etc.

answered Mar 30 '11 at 18:14

AVH

11,349
4
34
43

score 0 · Answer 8 · answered Mar 30 '11 at 18:41

I was going to write this up as a comment, but it became too long. But on to the point...

I agree with T.E.D. and Jonathan Wood:

Generally I've found the quickest way to deal with files is to try to read the whole thing into memory in one big I/O, and deal with it out of memory from then on in. It often makes the code simpler too.

-T.E.D

It depends. You should try and estimate the amount of memory on most computers that will run your software. It also depends on how speed critical your code is.

Obviously, one approach is faster while the other uses much more memory. In general, you are probably okay loading it into memory on most modern computers and that's easier. But you have to weigh the pros and cons in your particular case

-Jonathan Wood

Keep in mind that 5000*5000 pixels with 32bit colors takes up roughly 100 megabytes of memory (+ maybe some overhead, and whatever your software otherwise needs). I'd say (best guess Stetson-Harrison-value) most modern desktop computers have at least 1 or 2 gigabytes of memory (mine was bought in 2008 and has 4), so it's not that much really even if the whole thing is loaded at once, laptops might have less memory.

The CUDA aspect is also interesting (I know next to nothing about CUDA), is the data loaded into the GPU's memory? How much memory CUDA-enabled GPUs usually have? Could the PCI-e bus become a bottleneck (probably not..?)? Find out how much memory common CUDA-enabled desktop- and laptop-GPUs with CUDA-support have.

A sort of a compromise might be trying to buffer the reading, have another thread "read-ahead" the data from the file, while other(s) process (and free memory as they go) the data.

Speed Tradeoff: Frequently Reading from file vs storing it using dynamic memory

8 Answers8