12

I am trying to implement a strnstr function into C (strstr but it checks the length), for some reason it doesn't work (output is always no):

#include <stdio.h>

char *searchingFor = "stackdummy";
char *in = "la da\ndoo a da\nnow here comes the stack\nok there it was.\n";

char *strnstr(char *s1, char *s2, int length) {
    if(s1 == NULL || s2 == NULL) return NULL;
    printf("searching \n\n\"%s\"\n for %.*s\n", s1, length, s2);
    char *ss1 = malloc(strlen(s1) + 1);
    strcpy(ss1, s1);
    char *ss2 = malloc(length + 1);
    strncpy(ss2, s2, length);
    char *result = strstr(ss1, ss2);
    free(ss1);
    free(ss2);
    return result;
}

int main(void) {
    printf("found: %s\n", strnstr(in, searchingFor, 5) ? "yes" : "no");
    printf("found: %s\n", strnstr(in, searchingFor, 5) ? "yes" : "no");
    printf("found: %s\n", strnstr(in, searchingFor, 5) ? "yes" : "no");
    return 0;
}
Jimmay
  • 959
  • 3
  • 13
  • 27
  • 3
    `strncpy` does not add a terminating zero. – Jongware Jun 02 '14 at 17:10
  • 1
    Well, at least it's [in the documentation](http://en.cppreference.com/w/c/string/byte/strncpy). – Jongware Jun 02 '14 at 17:12
  • @MOehm my mistake, really. There's nothing wrong with pointers in a Boolean context. Thanks for the critique, btw :) – 7heo.tk Jun 02 '14 at 17:59
  • @7heo.tk: Well, I haven't said anything, then. – M Oehm Jun 02 '14 at 18:01
  • 1
    1) After `strncpy(ss2, s2, length);`, add `ss2[length] = '\0';` to insure `ss2` is terminated. 2) note: returning pointer to a malloc'd variable. – chux - Reinstate Monica Jun 02 '14 at 19:14
  • An alternative implementation can be found here: [Simple and safe implementation](http://codereview.stackexchange.com/questions/132519/a-simple-and-safe-implementation-for-strnstr-search-for-a-substring-in-the-firs) – CplusPuzzle Jun 20 '16 at 14:14

3 Answers3

10

The implementation provided by Chris Dodd has the following disadvantages:

  1. It defeats the purpose of strnstr in that the while condition uses the unbounded string function strchr
  2. It depends on haystack being NULL terminated, which is a deviation from the usual implementation of strnstr, for example as provided by GNU-Darwin
  3. The call to strchr is an unnecessary function call when strchar is not inlined
  4. Returns haystack instead of NULL when len is zero, a deviation from the accepted strstr semantics
  5. Returns an empty string instead of haystack when needle has length of zero

The following implementation remedies the above problems without becoming as difficult to read as the GNU-Darwin implementation, and is Creative Commons licensed:

#include <string.h>

char *strnstr(const char *haystack, const char *needle, size_t len)
{
        int i;
        size_t needle_len;

        if (0 == (needle_len = strnlen(needle, len)))
                return (char *)haystack;

        for (i=0; i<=(int)(len-needle_len); i++)
        {
                if ((haystack[0] == needle[0]) &&
                        (0 == strncmp(haystack, needle, needle_len)))
                        return (char *)haystack;

                haystack++;
        }
        return NULL;
}
Community
  • 1
  • 1
Jonathan Ben-Avraham
  • 4,615
  • 2
  • 34
  • 37
  • This seems broken as it will fail in the OP's code, because it runs off the end of `needle` (which is not null terminated). Also, `strchr` (or `memchr`) is generally faster than a 1-char-at-a-time for-loop, as it usually works word-at-a-time. – Chris Dodd Oct 21 '14 at 05:45
  • @ChrisDodd: You are correct regarding `needle`. However the NULL termination is required in all of the common string library (Linux, Mac, FreeBSD) implementations and so my assumption was that the OP really meant `strnstr` when he wrote `strnstr`. Regarding the use of `strchr` instead of a one-at-a-time loop, `strchr` itself is implemented as a one-at-a-time loop, so moving the loop to `strnstr` saves the function call overhead. The `memchr` implementation is even slower than `strchr`, and here you would need to add an additional test for a NULL character before `len` chars. – Jonathan Ben-Avraham Oct 21 '14 at 16:22
  • 1
    This doesn't work if len == strlen(needle) which looks like a pretty common scenario. – Ale Morales Jul 29 '15 at 16:13
  • 1
    @almosnow Whoops, your right. Off by one error. Try it now. Thanks! – Jonathan Ben-Avraham Aug 01 '15 at 19:32
  • There's a difference between the real strnstr and this tho, as the real one crashes (segmentation fault) when NULL is given for big and len is greater than the length of needle. In your case the for doesn't execute (cause len_little - len is negative) so the program doesn't reach if((haystack[0]....) to crash since it attempts NULL + 0, so your implementation doesn't crush whereas the real one does. Might not seem like much, but if you want an implementation that matches strnstr in everything the real one does, then you have to account for crashes as well. – Gabriel Em Dec 07 '17 at 17:08
  • One more difference between this implementation and BSD one when calling with the following arguments: `strnstr("abcdef", "abc", 2)`. This impl returns ptr to haystack, the BSD impl returns NULL. NULL is correct because we only should use 2-char portion of haystack. – 4LegsDrivenCat Mar 03 '20 at 19:31
  • I'm afraid your implementation is incorrect: `len` is the maximum number of characters from `haystack` to consider, all characters in null terminated `needle` must match a portion of the first `len` characters of `haystack`. – chqrlie Sep 21 '20 at 08:01
  • "Returns haystack instead of NULL when len is zero, a deviation from the accepted strstr semantics". Doesn't this implementation do the same? in "if (0 == (needle_len = strnlen(needle, len))) return (char *)haystack"; – user7048748 Dec 01 '21 at 18:35
3

How about:

char *strnstr(char *haystack, char *needle, size_t len) {
    if (len == 0) return haystack; /* degenerate edge case */
    while (haystack = strchr(haystack, needle[0])) {
        if (!strncmp(haystack, needle, len)) return haystack;
        haystack++; }
    return 0;
}

If you want haystack to not be null terminated, you'll need two length args:

char *memmem(char *haystack, size_t hlen, char *needle, size_t nlen) {
    if (nlen == 0) return haystack; /* degenerate edge case */
    if (hlen < nlen) return 0; /* another degenerate edge case */
    char *hlimit = haystack + hlen - nlen + 1;
    while (haystack = memchr(haystack, needle[0], hlimit-haystack)) {
        if (!memcmp(haystack, needle, nlen)) return haystack;
        haystack++; }
    return 0;
}

which is available in GNU libc, though older versions are broken.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • Wow, that's a much cleaner approach than mine. – Jimmay Jun 02 '14 at 17:28
  • Though first line should also check for if(len == 0 || !haystack || !needle) return haystack; – Jimmay Jun 02 '14 at 17:30
  • As written its analogous to strncmp -- `len == 0` is well defined as comparing zero chars, but passing a null pointer is undefined if `len > 0`. – Chris Dodd Jun 02 '14 at 17:37
  • 1
    @ChrisDodd Possibly pointless optimization, but given that you've already identified `haystack[0] == needle[0]`, you could `strncmp(haystack+1, needle+1, len-1)`... I guess the pointer adjustments could potentially outweigh the gain from not re-comparing the first character in some cases... – twalberg Jun 02 '14 at 18:15
  • This approach is also pathologically slow (quadratic time). If your system has `memmem` (nonstandard but fairly common) I think it could be used to implement "strnstr" efficiently, but doing so is not entirely trivial still. – R.. GitHub STOP HELPING ICE Jun 02 '14 at 18:32
  • Note: This code does not prevent reading `&haystack[len]` and beyond. – chux - Reinstate Monica Jun 02 '14 at 19:18
  • @R: This is O(nm) time, not quadratic, and is slower than optimal only on pathological inputs. @chux: `len` is the length of `needle` and has no relation to the length of `haystack`. – Chris Dodd Jun 02 '14 at 21:08
  • FreeBSD's version of `strnstr` treats `len` as length of portion of haystack to scan, not the length of needle. In your code you also treat it this way: `if (len == 0) return haystack;`, `if (nlen == 0) return haystack;` (which contradicts your comment). So results of your implementation differ from FreeBSD ones, e.g. for `std::strnstr("abcdef", "abc", 2)` and `std::strnstr("abcdef", "bcd", 3)`. – 4LegsDrivenCat Mar 03 '20 at 19:51
0

The strnstr function is not defined in the C Standard, it is available on BSD and some other systems as an extension.

Here is the man page on OS/X:

NAME

strstr, strcasestr, strnstr -- locate a substring in a string

LIBRARY

Standard C Library (libc, -lc)

SYNOPSIS

    #include <string.h>

[...]

    char *strnstr(const char *haystack, const char *needle, size_t len);

[...]

DESCRIPTION

[...]

The strnstr() function locates the first occurrence of the null-terminated string needle in the string haystack, where not more than len characters are searched. Characters that appear after a '\0' character are not searched. Since the strnstr() function is a FreeBSD specific API, it should only be used when portability is not a concern.

RETURN VALUES

If needle is an empty string, haystack is returned; if needle occurs nowhere in haystack, NULL is returned; otherwise a pointer to the first character of the first occurrence of needle is returned.

EXAMPLES

The following sets the pointer ptr to the "Bar Baz" portion of largestring:

       const char *largestring = "Foo Bar Baz";
       const char *smallstring = "Bar";
       char *ptr;

       ptr = strstr(largestring, smallstring);

The following sets the pointer ptr to NULL, because only the first 4 characters of largestring are searched:

       const char *largestring = "Foo Bar Baz";
       const char *smallstring = "Bar";
       char *ptr;

       ptr = strnstr(largestring, smallstring, 4);

This specification is not concise enough, (the man page for the linux kernel version is even more imprecise), yet the example on BSD systems (notably above here) is clear: len is the maximum number of bytes to consider in haystack, not needle, which is just a regular null terminated C string.

Your function does not work for multiple reasons:

  • the semantics are incorrect as you consider length to limit s2 instead of s1
  • in your approach, duplicating s1 is useless and counter-productive: result, if non NULL, will point into the allocated copy that is freed before returning from the function, hence accessing the string pointed to by the return value will have undefined behavior.
  • strncpy does not null terminate the destination array if the source string has at least length characters before its own null terminator. You must set ss2[length] = '\0'; for your approach to work, but again, the real strnstr() function operates differently.
  • using malloc() and free() is probably not what you are expected to do, and failing to test for potential allocation failure is a mistake.

Here is a corrected version:

char *strnstr(const char *s1, const char *s2, size_t n) {
    // simplistic algorithm with O(n2) worst case
    size_t i, len;
    char c = *s2;

    if (c == '\0')
        return (char *)s1;

    for (len = strlen(s2); len <= n; n--, s1++) {
        if (*s1 == c) {
            for (i = 1;; i++) {
                if (i == len)
                    return (char *)s1;
                if (s1[i] != s2[i])
                    break;
            }
        }
    }
    return NULL;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189