2

I am trying to figure out how to count the number times a specific string "test" occurs in a text file using C programming. I want the the program to display the final count upon completion.

This is the code I have come up with, but it doesn't seem to do to the trick. The count I generate is slightly lower than what is actually present in the text file.

Does anyone see what I'm doing wrong? I'm fairly new to C programming, so any insight would be greatly appreciated!

#include<stdio.h>
#include<string.h>

int main()
{
    FILE *ptr_file;
    char buf[200];
    char key[] = "test"; // the string I am searching for
    int wordcount = 0;

    ptr_file = fopen("input.txt","r"); // my input text file

    while (fgets(buf,200, ptr_file)!=NULL)
    {
        if((strstr(buf,key)) !=NULL){
            wordcount++;
        }
    }
    fclose(ptr_file);
    printf("%d",wordcount);
}
sheebs
  • 133
  • 1
  • 4
  • 16
  • It compiles for me with one minor warning. What compiler are you using and what is the error? – Carey Gregory Nov 26 '11 at 04:31
  • 1
    @Dani: Just saying the algorithm is wrong isn't helpful. I suspect S.S. is assuming that strstr will find undelimited strings and that's the error. – Carey Gregory Nov 26 '11 at 04:34
  • 1
    Leading questions for more correctness issues: what does your implementation do when the search text occurs more than once in a line? What happens if a line has 200 characters or more? – outis Nov 26 '11 at 04:34
  • 3
    What happens if the line contains the sequence 'testestestest'? What should happen? – Jonathan Leffler Nov 26 '11 at 04:47
  • 1 banana, 2 banana, 3 banana, 4. – outis Nov 26 '11 at 04:57
  • The error I am receiving is: warning: incompatible implicit declaration of built-in function âstrstrâ. I would like for the program to count every occurrence and increment the word count in a text file. – sheebs Nov 26 '11 at 05:15
  • The input file will consist of multiple lines that are a maximum of 200 characters, hence the restriction. So I want my program to go through each line, count the occurrences of the word "test" and increment the counter. I want the program to return the final count. – sheebs Nov 26 '11 at 05:20
  • use strstr , need #include – BLUEPIXY Nov 26 '11 at 05:22
  • 2
    @S.S.: note that the message you posted is a warning, not an error. Moreover, it (and any other clarifications) should be edited into the question. Questions should be understandable without reading comments. SO uses a Q&A, not forum, format. Should your program search for words (e.g. it wouldn't match "testing", "attest" or "contestant"), or any occurrence of the substring? – outis Nov 26 '11 at 05:34
  • Just the word "test" by itself. It will not be encapsulated within other words. – sheebs Nov 26 '11 at 05:39
  • See [Why do I get a warning everytime I use malloc?](http://stackoverflow.com/questions/1230386/) for another question about the same warning. – outis Nov 26 '11 at 05:42
  • 2
    @S.S.: again, please edit clarifications into your question rather than posting them as comments. Comments aren't well-suited for discussion. – outis Nov 26 '11 at 05:51

2 Answers2

2
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int wc(char* file_path, char* word){
    FILE *fp;
    int count = 0;
    int ch, len;

    if(NULL==(fp=fopen(file_path, "r")))
        return -1;
    len = strlen(word);
    for(;;){
        int i;
        if(EOF==(ch=fgetc(fp))) break;
        if((char)ch != *word) continue;
        for(i=1;i<len;++i){
            if(EOF==(ch = fgetc(fp))) goto end;
            if((char)ch != word[i]){
                fseek(fp, 1-i, SEEK_CUR);
                goto next;
            }
        }
        ++count;
        next: ;
    }
end:
    fclose(fp);
    return count;
}

int main(){//testestest : count 2
    char key[] = "test"; // the string I am searching for
    int wordcount = 0;

    wordcount = wc("input.txt", key);
    printf("%d",wordcount);
    return 0;
}
BLUEPIXY
  • 39,699
  • 7
  • 33
  • 70
  • 3
    That code may work but it sure is a trip back down memory lane to 1980 or thereabouts. – Carey Gregory Nov 27 '11 at 07:22
  • If you said about the GOTO, it is only the UPPER LOOP BREAK and CONTINUE. – BLUEPIXY Nov 27 '11 at 09:02
  • Sometimes I think why this disease has just What. – BLUEPIXY Nov 27 '11 at 09:09
  • @Carey Gregory BTW What is that in 2011? – BLUEPIXY Nov 27 '11 at 09:22
  • @BLUEPIXY: I wasn't criticizing the use of the goto. I have no problem with them when used appropriately. It's just that algorithm looks extremely familiar. I believe I saw it, or something almost identical to it, in some token parsing code many years ago. – Carey Gregory Nov 28 '11 at 02:32
  • @BLUEPIXY: In 2011 I would simply change his if-statement to something like this: { char *p = buf; while ((p = strstr(p, key))) { ++wordcount; ++p; } } – Carey Gregory Nov 30 '11 at 19:06
  • @Carey Gregory, You don't consider the buffer across. Mean age at 1980.5 – BLUEPIXY Nov 30 '11 at 22:03
  • I was mistaken, in technique, but rather of being able to use lots of memory is enough to be read the entire file into a buffer, answer in 2011. – BLUEPIXY Nov 30 '11 at 22:22
  • Because I'm very interested in programming, there is a difference of 30 years and the program was very interested in what. So really a pity. The fundamental solution just because it is used a lot of memory on it is not cause other problems. – BLUEPIXY Dec 02 '11 at 09:54
1

strstr is defined in the string.h header. If you don't include string.h, strstr is undeclared in your source file and it winds up implicitly declared to return an int and take unspecified arguments (that is, it's as if it were declared int strstr()). This can be problematic when the object file for your program is linked to the standard C library due to potential function signature mismatches, hence the warning.

The solution is simple: make sure you include string.h.

As for the problem of multiple occurrences of a search string in a line, note the first paragraph in the description section of the strstr man page:

The strstr() function finds the first occurrence of the substring needle in the string haystack. The terminating null bytes ("\0") are not compared.

While you can use strstr to find multiple substrings, you'd need to loop over the string, using a different starting location each time. Depending on where you start, it could match previously matched portions of the string (e.g. "testest" would count as 2 matches) or only against unmatched portions (e.g. "testest" would count as 1).

If you wish to count the occurrences of a complete word and not just a substring, strstr isn't very useful. One option is to use strpbrk or strcspn to find word (i.e. alphabetic) characters and strspn to find non-word characters. With these, you can find the first character of a word, compare to the search string and, if it matches, test that the next character isn't alphabetic. If it isn't, increment the count; if it is, go to the next word. Alternatively, you can loop over each character and use isalpha to distinguish letters from non-letters (hence, beginnings and endings of words).

Another option is to split the input into a list of words, then scan the word list for your search word. String tokenizing functions will do this, though they alter the buffer you pass in. You can also use fscanf to read a word at a time from the file. This has the added advantage of correctly handling long lines.

Community
  • 1
  • 1
outis
  • 75,655
  • 22
  • 151
  • 221
  • I added the #include and it compiles now. What would you recommend instead of strstr if I wanted to count the occurrences of a complete word? And also, thanks for all your help! I really appreciate it. – sheebs Nov 26 '11 at 05:44
  • 1
    It is not implicitly declared to take no arguments; it is implicitly declared to take an undefined set of arguments. – Jonathan Leffler Nov 26 '11 at 05:59