Read from file fast

Question

I have a txt file that contains 2 graphs and the number of vertices in the following format:

6
0 1 0 1 0 0
1 0 1 0 0 1
0 1 0 1 0 0
1 0 1 0 1 0
0 0 0 1 0 1
0 1 0 0 1 0
0 1 0 0 1 0
1 0 1 0 0 0
0 1 0 1 0 1
0 0 1 0 1 0
1 0 0 1 0 1
0 0 1 0 1 0

The matrices represent vertice adjacency. If two vertices are adjacent, their pair gets 1. Although the graphs are not separated visually, the second graph starts after the 6th row of the first. Each graph can have a lot of vertices, like 5000 and they are both of the same size (the graphs). I wrote an algorithm that checks if the two graphs are isomorphic and i noticed that reading the graphs takes 8 seconds and the actual algorithm takes 2.5 (for 5000 vertices). Since my goal is to optimize the overall speed of my program, I want to know if i can improve (in terms of speed) my current code of reading from file:

FILE* file = fopen ("input.txt", "r");
fscanf (file, "%d", &i);
int n = i;
while (!feof (file))
{  
    fscanf (file, "%d", &i); 
    if (j < (n*n)) {  // first graph
        if (i==1) {
            adj_1[j/n][v_rank_1[j/n]] = j - (j/n)*n; // add the vertice to the adjacents of the current vertice
            v_rank_1[j/n] += 1;
        }
    }
    else if (j>=(n*n)) {   // second graph
        if (i==1) {
            adj_2[(j-(n*n))/n][v_rank_2[(j-(n*n))/n]] = (j-(n*n)) - ((j-(n*n))/n)*n; // add the vertice to the adjacents of the current vertice
            v_rank_2[(j-(n*n))/n] += 1;
        }
    }
    j++;
}
fclose (file);

The adj_* table holds the indexes of the adjacent vertices of a vertice

The v_rank_* table holds the number of vertices adjacent to a vertice

It is important that I acquire this and only this information from the graph.

[This answer](http://stackoverflow.com/a/15863933/180736) is possibly related and may be of interest to you. — sorpigal, Jun 19 '15 at 19:22
Aside: I see you are using `feof` before attempting to read from file. This is wrong, please see http://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong and read the man page for `feof`. I suggest changing the loop to `while (1 != fscanf (file, "%d", &i))` — Weather Vane, Jun 19 '15 at 19:32
@WeatherVane Oh I'm sorry I didn't provide the previous code where I open the file and I read the number of vertices (it was way earlier in the code). Added it now — Tasos, Jun 19 '15 at 19:40
@Crone your added code does not forgive the use of `while (!feof())`. It's plain wrong, although off-topic. — Weather Vane, Jun 19 '15 at 19:48

Joël Hecht · Accepted Answer · 2015-07-04T07:42:16.230

The first optimization is to read the whole file in memory in one shot. Accessing memory in the loops will be faster than calling fread.

The second optimization is to do less arythmetic operations, even if it means more code.

Third optimization is treating the data from file as characters to avoid integer conversion.

The result could be:

// bulk read file into memory
fseek(file, 0, SEEK_END);
long fsize = ftell(file);
fseek(file, 0, SEEK_SET);
char *memFile = malloc(fsize + 1);
if (memFile == NULL) return; // not enough memory !! Handle it as you wish
fscanf(file, "%d", &n);
fread(memFile, fsize, 1, file);
fclose(file);
memfile[fsize] = 0;

// more code but less arythmetic operations
int lig, col;
char *mem = memFile, c;
for (int lig = 0; lig < n; lig++) { // first graph
    for (int col = 0; col < n; col++) {
        for (;;)
        {
            c = *mem;
            if (c == 0) break;
            mem++;
            if (c == '1') {
                adj_1[lig][v_rank_1[lig]++] = col; // add the vertice to the adjacents of the current vertice
                k++; // ??
                break;
            }
            if (c == '0') break;
        }

    }
}
for (int lig = 0; lig < n; lig++) { // second graph
    for (int col = 0; col < n; col++) {
            c = *mem;
            if (c == 0) break;
            mem++;
            if (c == '1') {
                adj_2[(lig][v_rank_2[lig]++] = col; // add the vertice to the adjacents of the current vertice
                l++;  // ??
                break;
            }
            if (c == '0') break;
        }
    }
}
free(memFile);

Remarks: you said nothing about variables k and l.

Holy @#! this method ran in 0.6 seconds compared to 8.0s-8.5s before. k and l is old code that's not used anymore. — Tasos, Jun 19 '15 at 22:05
Take away from this is that you are learning the tradeoffs in some algorithms and methodologies. Your original method used very little RAM but executed slowly because it accesses the file system so many times. Most of solutions proposed use more RAM but also execute much faster. Good luck. — jaybers, Jun 23 '15 at 12:58

score 2 · Answer 2 · answered Jun 19 '15 at 19:41

You could speed it up by accessing the file system less often. You are reading one integer at a time from the file thus accessing the file every time through the loop.

Instead, try reading the whole file or a large chunk of the file at once. (This is called block reading). You can buffer it into an array. Inside your loop, read from the memory buffer instead of the file. Refresh your memory buffer as needed inside the loop if you don't read in the entire file.

score 0 · Answer 3 · answered Jun 19 '15 at 19:43

Use fgets() to read a line at a time into a line buffer. Parse the line buffer into integer values.

This function reduces the number of times you read from the file, because behind the scenes, fgets() reads a large chunk of data from the file and returns a line at a time. It only attempts to read another chunk when there are no more lines left in its internal buffer.

Read from file fast

3 Answers3