3

Can I use the strstr function to match exact word? For example, let's say I have the word hello, and an input string line:

if

char* line = "hellodarkness my old friend";

and I use

result = strstr(line, "hello");

result will match (be not NULL), however I want to match only the exact word "hello" (so that "hellodarkness" would not match) and result will be NULL. Is it possible to do this using strstr or do I have to use fscan and scan the line word by word and check for matches?

A. Wali
  • 454
  • 3
  • 22
  • 3
    `strstr()` returns a pointer to the match, not just true or false... if you find a match, just check that the characters before and after it aren't alphabetic (or alphanumeric if you like) -- if they aren't, you've got a valid match, and if they are, try again from just past the returned pointer. – Dmitri Feb 20 '17 at 19:38
  • 1
    [The docs for `strstr()`](https://linux.die.net/man/3/strstr) explain its behavior. All it does is look for an occurrence of one string as a substring of another. If such a substring is found, `strstr()` itself conveys no information about the characters *surrounding* the appearance(s) of the substring, but you can look around yourself. – John Bollinger Feb 20 '17 at 19:40

3 Answers3

5

Here is a generic function for your purpose. It returns a pointer to the first match or NULL if none can be found:

#include <ctype.h>
#include <string.h>

char *word_find(const char *str, const char *word) {
    const char *p = NULL;
    size_t len = strlen(word);

    if (len > 0) {
        for (p = str; (p = strstr(p, word)) != NULL; p++) {
            if (p == str || !isalnum((unsigned char)p[-1])) {
                if (!isalnum((unsigned char)p[len]))
                    break;  /* we have a match! */
                p += len;   /* next match is at least len+1 bytes away */ 
            }
        }
    }
    return p;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
4

I would:

  • check if string is in sentence
  • if found at start (same pointer as line), add the length of the word and check if alphanumerical char found. If not (or null-terminated), then match
  • if found anywhere else, add the extra "no alphanum before" test

code:

#include <stdio.h>
#include <strings.h>
#include <ctype.h>
int main()
{
  const char* line = "hellodarkness my old friend";
  const char *word_to_find = "hello";
  char* p = strstr(line,word_to_find);
  if ((p==line) || (p!=NULL && !isalnum((unsigned char)p[-1])))
  {
     p += strlen(word_to_find);
     if (!isalnum((unsigned char)*p))
     {
       printf("Match\n");
     }
  }
  return 0;
}

here it doesn't print anything, but insert a punctuation/space before/after or terminate the string after "hello" and you'll get a match. Also, you won't get a match by inserting alphanum chars before hello.

EDIT: the above code is nice when there's only 1 "hello" but fails to find the second "hello" in "hellohello hello". So we have to insert a loop to look for the word or NULL, advancing p each time, like this:

#include <stdio.h>
#include <strings.h>
#include <ctype.h>
int main()
{
  const char* line = "  hellohello hello darkness my old friend";
  const char *word_to_find = "hello";
  const char* p = line;

  for(;;)
  {
    p = strstr(p,word_to_find);
    if (p == NULL) break;

    if ((p==line) || !isalnum((unsigned char)p[-1]))
    {
       p += strlen(word_to_find);
       if (!isalnum((unsigned char)*p))
       {
         printf("Match\n");
         break;  // found, quit
       }
    }
    // substring was found, but no word match, move by 1 char and retry
    p+=1;
  }

  return 0;
}
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Also need to check the character before the match, if it wasn't the beginning of the string. – Dmitri Feb 20 '17 at 19:40
  • @Jean-FrançoisFabre Check `if(p == line)` first, then `if (!isalnum(p-1))`, then `strlen(p)` and then `if (!isalnum(p+strlen(word_to_find)))` and you have checked all possibilities. – sabbahillel Feb 20 '17 at 19:53
  • no: if `p` is `NULL` then `isalnum(p[-1])` will read wrong memory. – Jean-François Fabre Feb 20 '17 at 19:56
  • The assumption was that p is not null (a match was found). Then check if p is the start of line (so p-1 is not valid). If p is not line start (so p-1 is valid) then check if it is a delimiter (`!isalnum()`). This is similar to the way the end is checked. – sabbahillel Feb 20 '17 at 20:03
  • I apologize for the typo on putting in parentheses for brackets. I was typing too fast – sabbahillel Feb 20 '17 at 20:04
  • ok, works, but my approach is faster when the sentence _starts_ by the word because you have to make it special and you don't have to test for NULL in that case. – Jean-François Fabre Feb 20 '17 at 20:06
  • Ah, now I see the extension in the if. I missed it. Good way of doing it. – sabbahillel Feb 20 '17 at 20:07
  • Hey @Jean-FrançoisFabre, thanks for your help. I tried your solution and it works on all test cases except this one: "hellohello hello abc", it doesn't detect the match in the second word. I tried to find a fix but still haven't found anything. Any Ideas? – A. Wali Feb 20 '17 at 20:29
  • `isalnum(*p)` should be `isalnum((unsigned char)*p)` to avoid undefined behavior on negative char values if the `char` type is signed. – chqrlie Feb 20 '17 at 20:45
  • @chqrlie didn't know that. edited. `isascii()` would do that check too I guess. but that would make 2 calls. – Jean-François Fabre Feb 20 '17 at 20:49
  • @Jean-FrançoisFabre: That's my favorite part of SO: learn something new everyday! `isascii()` is not a standard function, `isprint()` has the same issue as most other functions in ``, including `tolower()` and `toupper()`: the `int` argument can have all the values of type `unsigned char` or the special value `EOF` but no other. This is the exact same set of values returned by `fgetc()`,but not necessarily a superset of the values of type `char`. In my opinion, this is a strong argument for making `char` unsigned by default, but too many systems seem to rely on the opposite choice. – chqrlie Feb 20 '17 at 21:09
3

Since strstr() returns the pointer to the starting location of the substring that you want to identify, then you can use strlen(result) the check if it is a substring of longer string or the isolated string that you are looking for. if strlen(result) == strlen("hello"), then it ends correctly. If it ends with a space or punctuation (or some other delimiter), then it is also isolated at the end. You would also need to check if the start of the substring is at the beginning of the "long string" or preceded by a blank, punctuation, or other delimiter.

sabbahillel
  • 4,357
  • 1
  • 19
  • 36
  • ...but `strlen()` will happily continue past the word even if the next characters aren't alphanumeric, so how does this help? – Dmitri Feb 20 '17 at 19:44
  • @Dmitri I added the necessary check reference for a delimiter in the post. I typed too fast and left out the next sentence that I had meant to put in. – sabbahillel Feb 20 '17 at 19:45
  • `char *hs="hello", *str="hello world"; char *x=strstr(hs,str);` ... then `strlen(x)` is 11, and `strlen(hs)` is only 5... but the match should be valid (a whole-word match). – Dmitri Feb 20 '17 at 19:51
  • @Dmitri `strlen()` checks the possibility of it ending after the substring. If they are not equal, then check for a space or other delimiter. I just do not like to make the Null character part of the delimiter check. I say check for a space or other delimiter if `strlen()` does not match. – sabbahillel Feb 20 '17 at 19:58