Your program is also likely suffering from a form of I/O amplification where it rereads the same data over and over and over.
This is your main file-reading loop:
n=fgetc(fptr);
while((feof(fptr)==0)){
if(n==toupper(word[i])||n==tolower(word[i])){
count++;
i++;
}
else if(n!=word[i]){
if(count>1){
fseek(fptr, -count, SEEK_CUR);
}
count=0;
i=0;
}
if(count==l){
total++;
count=0;
i=0;
}
n=fgetc(fptr);
}
Reducing that to only I/O calls:
n=fgetc(fptr);
while((feof(fptr)==0)){
if(n!=word[i]){
if(count>1){
fseek(fptr, -count, SEEK_CUR);
}
count=0;
i=0;
}
n=fgetc(fptr);
}
What is happening:
- You open the file in read-only mode
- Since the file is buffered, when you first call
fgetc()
your program actually reads the file from its current offset and fills up its buffer. That means your program can read up to several kB (usually 4kB or 8kB depending on your system) immediately.
- Your program loops through a few calls to
fgetc()
that each return a char
value (held in an int
) to your code. Most times, that char
is simply copied from the buffer associated with fptr
.
- Your program calls
fseek()
. That call invalidates the buffered data.
- On your next call to
fgetc()
, your program fills up its buffer again, most of the time rereading data that has already been read.
Depending on how often your program calls fseek()
, your program likely reads several hundred to several thousand times more data in than is actually contained in the file.
It's not quite as bad as it seems though because most of the reads are hopefully not being read all the way from the disk but are satisfied by your system's page cache. But each one of the fseek()
calls results in an extraneous context switch that, along with all the extra calls to read a char
at a time by using fgetc()
, is likely slowing down your program considerably.
Simply reading large chunks of data with something like fread()
will work, but because you "back up" in the data stream (your fseek()
calls), you have to account for the possibility of "backing up" into the previous chunk of data.
And that's a bit difficult and tedious to do reliably.
The easiest solution if words don't continue across two lines is to read by line using fgets()
(or getline()
on POSIX systems):
for (;;)
{
// define MAX_LINE_LENGTH to a suitable value
char line[ MAX_LINE_LENGTH ];
char *result = fgets( line, sizeof( line ), fp );
// EOF (or error - either way there's no more data to be read)
if ( result == NULL )
{
break;
}
// remove newline (if you want)
line[ strcspn( line, "\n" ) ] = '\0';
// now process a line of text
.
.
.
}
Reading in lines also allows the use of standard functions such as strtok()
to split input into separate words and then the use of strncasecmp()
to find case-insensitive matches to the word you're looking for.