113

Is there something like startsWith(str_a, str_b) in the standard C library?

It should take pointers to two strings that end with nullbytes, and tell me whether the first one also appears completely at the beginning of the second one.

Examples:

"abc", "abcdef" -> true
"abcdef", "abc" -> false
"abd", "abdcef" -> true
"abc", "abc"    -> true
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
thejh
  • 44,854
  • 16
  • 96
  • 107

10 Answers10

221

There's no standard function for this, but you can define

bool prefix(const char *pre, const char *str)
{
    return strncmp(pre, str, strlen(pre)) == 0;
}

We don't have to worry about str being shorter than pre because according to the C standard (7.21.4.4/2):

The strncmp function compares not more than n characters (characters that follow a null character are not compared) from the array pointed to by s1 to the array pointed to by s2."

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
91

Apparently there's no standard C function for this. So:

bool startsWith(const char *pre, const char *str)
{
    size_t lenpre = strlen(pre),
           lenstr = strlen(str);
    return lenstr < lenpre ? false : memcmp(pre, str, lenpre) == 0;
}

Note that the above is nice and clear, but if you're doing it in a tight loop or working with very large strings, it does not offer the best performance, as it scans the full length of both strings up front (strlen). Solutions like wj32's or Christoph's may offer better performance (although this comment about vectorization is beyond my ken of C). Also note Fred Foo's solution which avoids strlen on str (he's right, it's unnecessary if you use strncmp instead of memcmp). Only matters for (very) large strings or repeated use in tight loops, but when it matters, it matters.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • 7
    I should mention that the *usual* thing would be for the string to be the first parameter, and the prefix to the be second. But I kept them as above because the seemed to be how your question was framed... The order is entirely up to you, but I really should have done it the other way 'round -- most string functions take the full string as the first argument, the substring as the second. – T.J. Crowder Jan 22 '11 at 22:31
  • 2
    This is an elegant solution, but it does have some performance issues. An optimized implementation would never look at more than min(strlen(pre), strlen(str)) characters from each string, nor would it ever look beyond the first mismatch. If the strings were long, but early mismatches were common, it would be very lightweight. But since this implementation takes the full length of both strings right up front, it forces worst-case performance, even if the strings differ in the very first character. Whether this matters really depends on the circumstances, but it's a potential problem. – Tom Karzes Jan 06 '18 at 11:33
  • @TomKarzes: Absolutely, I've gotten spoiled by languages/environments where string length is a known value rather than one we have to go figure out. :-) [wj32's solution](https://stackoverflow.com/a/4771055/157247) offers much better performance. Only matters for (very) large strings or tight loops, but when it matters, it matters. – T.J. Crowder Jan 06 '18 at 11:38
  • 2
    @TomKarzes You can substitute `memcmp` for `strncmp` here and it's faster. There's no UB because both strings are known to have at least `lenpre` bytes. `strncmp` checks each byte of both strings for NUL, but the `strlen` calls already guaranteed that there aren't any. (But it still has the performance hit you mentioned, when `pre` or `str` are longer than the actual common initial sequence.) – Jim Balter Aug 01 '19 at 18:43
  • 1
    @JimBalter - Very good point! Since using `memcmp` above wouldn't be appropriating from another answer here, I went ahead and changed it in the answer. – T.J. Crowder Aug 02 '19 at 06:46
  • 1
    P.S. This (now) may be the fastest answer on some machines with some strings, because `strlen` and `memcmp` can be implemented with very fast hardware instructions, and the `strlen`s may put the strings into the cache, avoiding a double memory hit. On such machines, `strncmp` could be implemented as two `strlen`s and a `memcmp` just like this, but it would be risky for a library writer to do so, as that could take much longer on long strings with short common prefixes. Here that hit is explicit, and the `strlen`s are only done once each (Fred Foo's `strlen` + `strncmp` would do 3). – Jim Balter Aug 02 '19 at 17:06
  • 1
    P.P.S. This is even more effective if the function is inlined and the length of one or more argument is already known -- e.g., a constant. Consider checking several different prefixes against the same string -- one `strlen` for the target string, plus a `strlen` (unless constant) and `memcmp` for each prefix (and not even that if the prefix is longer than the target). – Jim Balter Aug 02 '19 at 17:57
41

I'd probably go with strncmp(), but just for fun a raw implementation:

_Bool starts_with(const char *restrict string, const char *restrict prefix)
{
    while(*prefix)
    {
        if(*prefix++ != *string++)
            return 0;
    }

    return 1;
}
Christoph
  • 164,997
  • 36
  • 182
  • 240
6

Use strstr() function. Stra == strstr(stra, strb)

Reference

The strstr() function finds the first occurrence of string2 in string1. The function ignores the null character (\0) that ends string2 in the matching process.

https://www.ibm.com/docs/en/i/7.4?topic=functions-strstr-locate-substring

Sridhar Sarnobat
  • 25,183
  • 12
  • 93
  • 106
gscott
  • 93
  • 1
  • 5
    that seems to be somewhat backwards way of doing it - you'll go through whole stra even though it should be clear from very short initial segment if strb is a prefix or not. – StasM Jan 22 '11 at 23:02
  • 2
    Premature optimization is the root of all evil. I think this is the best solution, if it is not time critical code or long strings. – Frank Buss Nov 11 '18 at 14:26
  • 4
    @ilw It's a famous saying by famous computer scientists -- google it. It's often misapplied (as it is here) ... see http://www.joshbarczak.com/blog/?p=580 – Jim Balter Aug 02 '19 at 17:22
  • I'm with Frank personally. Unix Philosophy: clarity is better than cleverness. – Sridhar Sarnobat Feb 07 '23 at 02:13
4

I'm no expert at writing elegant code, but...

int prefix(const char *pre, const char *str)
{
    char cp;
    char cs;

    if (!*pre)
        return 1;

    while ((cp = *pre++) && (cs = *str++))
    {
        if (cp != cs)
            return 0;
    }

    if (!cs)
        return 0;

    return 1;
}
wj32
  • 8,053
  • 3
  • 28
  • 37
2

I noticed the following function definition in the Linux Kernel. It returns true if str starts with prefix, otherwise it returns false.

/**
* strstarts - does @str start with @prefix?
* @str: string to examine
* @prefix: prefix to look for.
*/
bool strstarts(const char *str, const char *prefix)
{
     return strncmp(str, prefix, strlen(prefix)) == 0;
}
Farzam
  • 131
  • 2
  • 13
  • How is this different from Fred Foo's answer, apart from the order of arguments? – chqrlie Jul 15 '22 at 22:18
  • 1
    The obvious difference is that I provided a reference to the code that I did not write. The code was added to the Linux Kernel in 2009 [1], 2 years before Fred Foo's answer was posted. So you should question Fred Foo's answer, not mine. [1]: https://github.com/torvalds/linux/commit/66f92cf9d415e96a5bdd6c64de8dd8418595d2fc – Farzam Jul 15 '22 at 23:16
  • this solution is rather obvious, Linus was not the first to write it either. Note that *Christoph*'s solution is simpler and probably more efficient, and the accepted solution is clumsy. – chqrlie Jul 15 '22 at 23:24
1

Optimized (v.2. - corrected):

uint32 startsWith( const void* prefix_, const void* str_ ) {
    uint8 _cp, _cs;
    const uint8* _pr = (uint8*) prefix_;
    const uint8* _str = (uint8*) str_;
    while ( ( _cs = *_str++ ) & ( _cp = *_pr++ ) ) {
        if ( _cp != _cs ) return 0;
    }
    return !_cp;
}
Zloten
  • 97
  • 2
  • 4
0

Because I ran the accepted version and had a problem with a very long str, I had to add in the following logic:

bool longEnough(const char *str, int min_length) {
    int length = 0;
    while (str[length] && length < min_length)
        length++;
    if (length == min_length)
        return true;
    return false;
}

bool startsWith(const char *pre, const char *str) {
    size_t lenpre = strlen(pre);
    return longEnough(str, lenpre) ? strncmp(str, pre, lenpre) == 0 : false;
}
Jordan
  • 358
  • 1
  • 9
0

Or a combination of the two approaches:

_Bool starts_with(const char *restrict string, const char *restrict prefix)
{
    char * const restrict prefix_end = prefix + 13;
    while (1)
    {
        if ( 0 == *prefix  )
            return 1;   
        if ( *prefix++ != *string++)
            return 0;
        if ( prefix_end <= prefix  )
            return 0 == strncmp(prefix, string, strlen(prefix));
    }  
}

EDIT: The code below does NOT work because if strncmp returns 0 it is not known if a terminating 0 or the length (block_size) was reached.

An additional idea is to compare block-wise. If the block is not equal compare that block with the original function:

_Bool starts_with_big(const char *restrict string, const char *restrict prefix)
{
    size_t block_size = 64;
    while (1)
    {
        if ( 0 != strncmp( string, prefix, block_size ) )
          return starts_with( string, prefix);
        string += block_size;
        prefix += block_size;
        if ( block_size < 4096 )
          block_size *= 2;
    }
}

The constants 13, 64, 4096, as well as the exponentiation of the block_size are just guesses. It would have to be selected for the used input data and hardware.

shpc
  • 79
  • 1
  • 7
  • 1
    These are good ideas. Note though that the first one is technically undefined behavior if the prefix is shorter than 12 bytes (13 including NUL) because the language standard does not define the result of calculating an address outside the string other than the immediately following byte. – Jim Balter Aug 02 '19 at 17:17
  • @JimBalter: Could you add a reference? If the pointer is dereferenced and is after the terminating 0 then the deferenced pointer value is undefined. But why should the address itself be undefined? It is just a calculation. – shpc Aug 12 '19 at 12:24
  • There was a general bug however: The `block_size` incrementation must be after the pointer incrementation. Now fixed. – shpc Aug 12 '19 at 12:47
0

I use this macro:

#define STARTS_WITH(string_to_check, prefix) (strncmp(string_to_check, prefix, ((sizeof(prefix) / sizeof(prefix[0])) - 1)) ? 0:((sizeof(prefix) / sizeof(prefix[0])) - 1))

It returns the prexif length if the string starts with the prefix. This length is evaluated compile time (sizeof) so there is no runtime overhead.

  • 1
    (I am quite certain that) optimizers would evaluate strlen for literals anyway so there is very little use for the sizeof. Additionally, hiding a sizeof (that only works properly if the respective object is in scope) within a macro is a very easy way to introduce horrendous bugs. – stefanct Oct 04 '21 at 22:41