1

I made a program which read characters in a file, many times, in a loop. If I don't care about memory usage, is storing all characters of the file in an array faster than accessing characters with fgetc ?

Spooky
  • 315
  • 1
  • 3
  • 12

2 Answers2

4

In general, it's impossible to answer performance questions without knowing the details of the platform and the exact code you want to compare. However, in this case, buffering the file contents in an array is likely to be much faster on most platforms.

For one, disk is orders of magnitude slower than main memory.

And even if your OS (or libc) caches the data in RAM, fgetc still performs a system call to get it, which is likely much slower than a simple memory read.

Also because of the relative slowness of system calls, use fread instead of fgetc to read a block of bytes in a single call.

Thomas
  • 174,939
  • 50
  • 355
  • 478
  • `fread` reads bytes, which are pretty much equivalent to `char`s ([citation](http://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1-or-at-least-char-bit-8)). – Thomas Jan 01 '16 at 17:20
  • `fgetc` still does buffering, there aren't any more system calls involved than if you'd do an `fread`. – fuz Jan 01 '16 at 17:21
  • Can I just replace the calls to fgetc with fread and cast the result ? – Spooky Jan 01 '16 at 17:21
  • @FUZxxl Not necessarily -- I can't find this requirement in the standard. Even if it buffers, anecdotally [`fgetc` still turns out to be slower](http://stackoverflow.com/questions/13225014/why-fgetc-too-slow). – Thomas Jan 01 '16 at 17:24
  • 1
    @Spooky No, use `fread` to read the data straight into your array. You may need a cast. – Thomas Jan 01 '16 at 17:25
  • @Thomas That's why you should use `getc` if possible, but still, glibc is braindead and doesn't implement that as a macro. – fuz Jan 01 '16 at 17:41
  • @Thomas See ISO 9899:2011 §7.21.3 “(...) When a stream is *fully buffered*, characters are intended to be transmitted to or from the host environment as a block when a buffer is filled. (...)” This applies to all stdio functions, including `fgetc` and `fread`. On all but the most exotic platforms, streams are buffered by default although the standard leaves that open. – fuz Jan 01 '16 at 18:44
  • @Thomas Yes, one `fread` is faster than a series of `fgetc` calls, but `getc` calls are often still as fast as manually keeping track of another layer of buffering and often make for easier code. When speed is needed, consider `getc_unlocked()`. – fuz Jan 01 '16 at 18:44
1

I think you should at least use some form of buffering and not read a character at a time to fill the buffer or array.

Better use fread() to fill a buffer/array, or you might even look into memory mapping (mmap), to avoid copyiing data from the disk cache in kernel mode to a buffer in user mode if you want slightly more performance (since your question is tagged performance too). Although, for a single read pass, your harddisk will certainly be the botlleneck.

If you only need to read the data once, fread() with buffer(s) might be the way to go.

Danny_ds
  • 11,201
  • 1
  • 24
  • 46