You can obtain the size of a file with stat()
, from <sys/stat.h>
; e.g., see this SO question: How do you determine the size of a file in C? Once you have the file size, you could allocate a char array big enough to hold it.
But, you can also parse through a file withOUT reading it all into memory first. You read a few bytes at a time into a small buffer and work with just those bytes. Below is a quick-n-dirty implementation of that approach, based on your code.
PLEASE NOTE: There are several ways to improve this code. For one, there should be more error checking; for another, you can use the strcmp()
/ strncmp()
/ strnicmp()
family of functions to more efficiently inspect the input buffer; for another, you can use command line arguments instead of hard-coded values (I did that, below; it was the only sane way I could feed a bunch of test input files in); for yet another, you can use e.g. buf[indx++] = ch
as shorthand (because that post-increments); etc.
My main point with the code below is to help you start to think about file processing as a stream, rather than reading in the whole file up front. The comments others have added to your question are well worth noting, too. Hope this helps!
// count of occurrences of word 'is' in input file
#include<stdio.h>
#include<string.h>
int main(int argc, char** argv) {
FILE *fp;
int count = 0;
int times = 0;
char ch = 0;
char buf[8]; // more than enough room to look for 'is' words
int indx = 0;
fp = fopen(argv[1], "r");
// fill the input buffer with nul bytes
memset(buf, 0, 8);
indx = 0;
// pretend that the input file starts with ' ', in order
// to detect 'is' at the start of the file
buf[indx] = ' ';
indx++;
while ((ch = fgetc(fp)) != EOF) {
count++;
buf[indx] = ch;
indx++;
// uncomment this to see the progression of 'buf' as
// the input file is being read
//printf("buf is : [%s]\n", buf);
// if the input buffer does not begin with a word
// boundary, start the input buffer over by resetting
// it and looping back to the top of the reading loop
if (buf[0] != ' ' && buf[0] != ',' && buf[0] != '\n') {
memset(buf, 0, 8);
indx = 0;
continue;
}
// if we have read 4 characters (indx 0 through indx 3),
// it's time to look to see if we have an 'is'
if (indx == 4) {
// if we have 'is' between word boundaries, count it
if ((buf[0] == ' ' || buf[0] == ',' || buf[0] == '\n') &&
(buf[1] == 'i' || buf[1] == 'I') &&
(buf[2] == 's' || buf[2] == 'S') &&
(buf[3] == ' ' || buf[3] == ',' || buf[3] == '\n')) {
times++;
}
// reset the input buffer
memset(buf, 0, 8);
indx = 0;
// if we ended with a word boundary, preserve it as the
// word boundary at the beginning of the next word
if (ch == ' ' || ch == ',' || ch == '\n') {
buf[indx] = ' ';
indx++;
}
}
}
// EOF is also a word boundary, so we do one final check to see
// if there is an 'is' at the end of the file
if ((buf[0] == ' ' || buf[0] == ',' || buf[0] == '\n') &&
(buf[1] == 'i' || buf[1] == 'I') &&
(buf[2] == 's' || buf[2] == 'S')) {
times++;
}
printf("input file is %d characters long\n", count);
printf("the string IS appeared %d times in the input file\n", times);
}
Additional information about argc and argv (re: comment question)
argc is the number of command line arguments; argv is a set of pointers to those command line arguments.
argv[0]
always points to the command itself (i.e., the name of the executing program). argc
is often used to check for a minimum number of command line arguments, as a limit to loop over the command line arguments, as a test before using argv[n]
, etc. Sometimes, you will see argv specified as char *argv[]
, which of course operates the same way as char **argv
.
So, the line fp = fopen(argv[1], "r");
uses the 1st command line argument as the filename of the input file. e.g., in my tests, I compiled this code as countis
and executed it with countis countis-input-test-001
. (I had a series of test input files, and used a shell script to process each one, to test each edit I made to the program.)
Here are a couple of places to read more and see code examples using argc and argv:
https://www.tutorialspoint.com/cprogramming/c_command_line_arguments.htm
http://www.teach.cs.toronto.edu/~ajr/209/notes/argv.html
You can also google c programming argc argv
or similar for many more similar resources.