-1

My below code provides count of occurrences of "is" in a word file. But in this program I am defining size of file beforehand. Help me modify the program so that I could fetch the word "is" in a file whose word count is unknown. The length of the array file should be equal to the length of the word file.

// Count of occurrence of word 'is' in file WordFile.

#include<stdio.h>
#include<conio.h>
#include<string.h>

//function to append

void append(char* s, char c)
{
        int len = strlen(s);
        s[len] = c;
        s[len+1] = '\0';
}

void main()
{
    FILE *fp;
    int i=0,count=0,j,k,space,times=0;
    char ch,file[1000];

    fp = fopen("../WordFile.txt","r");

    while ((ch=fgetc(fp)) != EOF)
    {
        count++;
        append(file,ch);


    }

    printf("Count of file is %d \n",count);

    printf("%s \n",file);

    for(i=0;i<(count-3);i++)
    {
        j = (file[i] == 'i'  || file[i] == 'I');

        k = (file[i+1] == 's' || file[i+1] == 'S');

        space = (file[i+2] == ' ' || file[i+2] == ',' || file[i+2] == EOF);

        if( (j && k && space ) == 1 )
            times ++;
    }

    printf("the string IS appeared %d times in the griven file. \n", times);
    getch();

}
q-l-p
  • 4,304
  • 3
  • 16
  • 36
Santosh
  • 11
  • 2

1 Answers1

0

You can obtain the size of a file with stat(), from <sys/stat.h>; e.g., see this SO question: How do you determine the size of a file in C? Once you have the file size, you could allocate a char array big enough to hold it.

But, you can also parse through a file withOUT reading it all into memory first. You read a few bytes at a time into a small buffer and work with just those bytes. Below is a quick-n-dirty implementation of that approach, based on your code.

PLEASE NOTE: There are several ways to improve this code. For one, there should be more error checking; for another, you can use the strcmp() / strncmp() / strnicmp() family of functions to more efficiently inspect the input buffer; for another, you can use command line arguments instead of hard-coded values (I did that, below; it was the only sane way I could feed a bunch of test input files in); for yet another, you can use e.g. buf[indx++] = ch as shorthand (because that post-increments); etc.

My main point with the code below is to help you start to think about file processing as a stream, rather than reading in the whole file up front. The comments others have added to your question are well worth noting, too. Hope this helps!

// count of occurrences of word 'is' in input file

#include<stdio.h>
#include<string.h>

int main(int argc, char** argv) {
    FILE *fp;
    int count = 0;
    int times = 0;

    char ch = 0;
    char buf[8];    // more than enough room to look for 'is' words
    int indx = 0;

    fp = fopen(argv[1], "r");

    // fill the input buffer with nul bytes
    memset(buf, 0, 8);
    indx = 0;

    // pretend that the input file starts with ' ', in order
    // to detect 'is' at the start of the file
    buf[indx] = ' ';
    indx++;

    while ((ch = fgetc(fp)) != EOF) {
        count++;

        buf[indx] = ch;
        indx++;

        // uncomment this to see the progression of 'buf' as
        // the input file is being read
        //printf("buf is : [%s]\n", buf);

        // if the input buffer does not begin with a word
        // boundary, start the input buffer over by resetting
        // it and looping back to the top of the reading loop
        if (buf[0] != ' ' && buf[0] != ',' && buf[0] != '\n') {
            memset(buf, 0, 8);
            indx = 0;
            continue;
        }

        // if we have read 4 characters (indx 0 through indx 3),
        // it's time to look to see if we have an 'is'
        if (indx == 4) {
            // if we have 'is' between word boundaries, count it
            if ((buf[0] == ' ' || buf[0] == ',' || buf[0] == '\n') &&
                (buf[1] == 'i' || buf[1] == 'I') &&
                (buf[2] == 's' || buf[2] == 'S') &&
                (buf[3] == ' ' || buf[3] == ',' || buf[3] == '\n')) {
                times++;
            }

            // reset the input buffer
            memset(buf, 0, 8);
            indx = 0;

            // if we ended with a word boundary, preserve it as the
            // word boundary at the beginning of the next word
            if (ch == ' ' || ch == ',' || ch == '\n') {
                buf[indx] = ' ';
                indx++;
            }
        }
    }
    // EOF is also a word boundary, so we do one final check to see
    // if there is an 'is' at the end of the file
    if ((buf[0] == ' ' || buf[0] == ',' || buf[0] == '\n') &&
        (buf[1] == 'i' || buf[1] == 'I') &&
        (buf[2] == 's' || buf[2] == 'S')) {
        times++;
    }

    printf("input file is %d characters long\n", count);
    printf("the string IS appeared %d times in the input file\n", times);
}

Additional information about argc and argv (re: comment question)

argc is the number of command line arguments; argv is a set of pointers to those command line arguments.

argv[0] always points to the command itself (i.e., the name of the executing program). argc is often used to check for a minimum number of command line arguments, as a limit to loop over the command line arguments, as a test before using argv[n], etc. Sometimes, you will see argv specified as char *argv[], which of course operates the same way as char **argv.

So, the line fp = fopen(argv[1], "r"); uses the 1st command line argument as the filename of the input file. e.g., in my tests, I compiled this code as countis and executed it with countis countis-input-test-001. (I had a series of test input files, and used a shell script to process each one, to test each edit I made to the program.)

Here are a couple of places to read more and see code examples using argc and argv:

https://www.tutorialspoint.com/cprogramming/c_command_line_arguments.htm http://www.teach.cs.toronto.edu/~ajr/209/notes/argv.html

You can also google c programming argc argv or similar for many more similar resources.

landru27
  • 1,654
  • 12
  • 20
  • HI Landru, Thanks for revert.Could you please explain in above program use of int argc and char** argv. Also as i am not giving anywhere the name of file from which i want to count number of times a word is appearing. I tried executing the given program but i got 0 in output. – Santosh May 17 '18 at 17:24
  • @Santosh : sure thing; in fact, both of your questions are tightly related; **argc** holds the value for the number of command line arguments, and **argv** is a set of pointers to those command line arguments; `fp = fopen(argv[1], "r");` obtains the input file name from the first command line argument; I'll post an edit to my answer with a bit more info – landru27 May 17 '18 at 22:32
  • `fgetc` returns `int` by intention! – alk May 20 '18 at 07:44