0

I'm working on a C program (Ubuntu 14.04) that does basically:

  • Opens a 1GB file
  • Reads it by buffer of 1MB
  • Looks for some objects in the buffer
  • Computes the MD5 signature of each object found

My program take 10 secondes the first time to achieve this, and then only 1 seconde the next times (even if I work on a second copy of initial file).

I know that this has something to do with caching, does my program work on cached data after the first time ? or directly show cached results without doing any computation ?

int main(int argc, char** argv) {
unsigned char buffer[BUFFER_SIZE];
int i, number, count = 0;
int start, end = 0;
FILE *file;
file = fopen("/dump/ram.lime", "r");
if (file != NULL) {
    while ((number = fread(buffer, 1, BUFFER_SIZE, file)) > 0) {           
        for (i = 0; i < number; i++) {
            find_object(buffer, &start, &end);
            md5_compute(&buffer[start], end - start);
        }
    }
} else {
    printf("errno %d \n", errno);
}
printf("count = %d \n", count);
return (EXIT_SUCCESS);

}

Yacine Hebbal
  • 394
  • 3
  • 16

1 Answers1

1

Because the second time, most of your program code and most of the file data are already sitting in the page cache (and the kernel won't need any disk I/O to get them into RAM)

You'll likely to observe similar speedup if you run any other program (like cat or wc) on that big file which reads it sequentially before running your own code.

See also posix_fadvise(2), sync(2) & the Linux specific readahead(2) & http://www.linuxatemyram.com/ ; use the free(1) command before, during, and after running your program to measure memory. Read also about proc(5), since /proc/ contains a lot of useful pseudo-files describing the kernel state of your machine or your process.

Use also time(1), perhaps as /usr/bin/time -v, to benchmark several times your program. See also time(7) & getrusage(2) ...

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547