-2

I want to make a code in c for searching how many times a string is found inside a txt file using strstr() function.

I have made a test code with strastr() but I have a problem.

e.g I have sentence like " this is a text" and when I search for "is" I get a result "is found 2 times" because it takes the "is" from "this". I don't want to take the is from this i want "is" as only word. Can i search without this "problem" with strstr() with some changes

#include <stdio.h>
#include<string.h>
int main()
{
    char*ptr;
    char input[]=("this is a text");
    char key[10];
    int counter;
    scanf("%s",key);
    ptr=strstr(input,key);
    while (ptr==NULL)
    {
        printf("not found\n");
        break;
    }
    while(ptr!=NULL)
    {
        counter++;
        ptr=strstr(ptr+1,key);
    }
    printf("%s found %d times\n",key,counter);
    return 0;
}
Bas
  • 469
  • 8
  • 22
  • 2
    You haven't said what your problem is. – AntonH May 22 '14 at 23:04
  • 3
    I suspect even your language has full stops at the end of its sentences. – Lightness Races in Orbit May 22 '14 at 23:05
  • for example i want search for word "is" counts 2 times the word "is" – user3666993 May 22 '14 at 23:09
  • 1
    @user3666993 And that's what it does. Unless you want the word "is" without any alpha character in front or behind, but you haven't specified this. – AntonH May 22 '14 at 23:11
  • 1
    @user3666993 You can avoid that by checking that you're either at the beginning of the string, or the character preceding the match returns true for `isspace()`. Also, `counter` is uninitialized. Learn to pay attention to compiler warnings! – Praetorian May 22 '14 at 23:14
  • i want find only "is"o do not want part from words with "is" sorry again for my english – user3666993 May 22 '14 at 23:15
  • th[is] [is] a text : legit 2 times `strstr()` did its work fine , however you should make your own function that will check for that corners – Coldsteel48 May 22 '14 at 23:17
  • If you're at the beginning of the string and see an i, check for the s then make sure there is no alpha character (letter) after it. If there is no alpha character, that's +1 "is". If you aren't at the beginning, check the character before the i and the one after the s. If this is satisfied, that's another +1 "is". If you're at the end of the string, just check before the i for no alpha then after the s for NULL or a period (if the sentence has a period stop). Do these checks for whenever you find a match. – aglasser May 22 '14 at 23:19

2 Answers2

1

That's the expected behavior, strstr() function doesn't match for whole words it's simply a string matcher that matches any substring found. for your requirements, you need to write a custom string matcher that searches for whole words and matches them.

one way to do it is:

     1- Read the file character by character, skip all non alpha characters.
     2- Start matching the word you are searching for character by character 
until either
         - You mismatch one character, now skip all alpha characters.
         - You matched the whole word, 
             - if the next character in the file is non-Alpha 
                 - Increment your counter.

http://www.cplusplus.com/reference/cstring/strstr/

mmohab
  • 2,303
  • 4
  • 27
  • 43
1

I'm guessing that you want to do some form of regular expression in C. And in your example, you're looking for "\bis\b", rather than the characters "is" (in which case, you're getting the correct results).

You can either look into using some form of regex library. These links have some info:

Regular expressions in C: examples?

http://www.lemoda.net/c/unix-regex/

Or you can look into implementing specifically what you're looking for. Something along the lines of

while(ptr!=NULL)
{
    if (ptr == &input[0]) {
        if (isspace(*(ptr+strlen(key)) || *(ptr+strlen(key)=='\0') {
            counter++;
        }
    } else {
        if (isspace(*(ptr-1) && (isspace(*(ptr+strlen(key)) || *(ptr+strlen(key)=='\0')) {
            counter++;
        }
    }
    ptr=strstr(ptr+1,key);
}

Note: I know that this code is far from being optimised, but I think it's code that works and is pretty self-explanatory.

Community
  • 1
  • 1
AntonH
  • 6,359
  • 2
  • 30
  • 40