How to check if a string starts with another string in C?

Question

Is there something like startsWith(str_a, str_b) in the standard C library?

It should take pointers to two strings that end with nullbytes, and tell me whether the first one also appears completely at the beginning of the second one.

Examples:

"abc", "abcdef" -> true
"abcdef", "abc" -> false
"abd", "abdcef" -> true
"abc", "abc"    -> true

possible duplicate of https://stackoverflow.com/questions/15515088/how-to-check-if-string-starts-with-certain-string-in-c/15515276 — vacing, Oct 15 '19 at 06:52

score 221 · Answer 1 · edited Jul 10 '15 at 14:35

221

There's no standard function for this, but you can define

bool prefix(const char *pre, const char *str)
{
    return strncmp(pre, str, strlen(pre)) == 0;
}

We don't have to worry about str being shorter than pre because according to the C standard (7.21.4.4/2):

The strncmp function compares not more than n characters (characters that follow a null character are not compared) from the array pointed to by s1 to the array pointed to by s2."

edited Jul 10 '15 at 14:35

T.J. Crowder

1,031,962
187
1,923
1,875

answered Jan 22 '11 at 22:17

Fred Foo

355,277
75
744
836

17

Why is the answer no? Clearly, the answer is yes, it's called `strncmp`. – Jasper Feb 13 '17 at 00:39
14

^ It should be obvious why the answer is no. An algorithm that employs `strncmp` and `strlen` is not "called strncmp". – Jim Balter Aug 01 '19 at 18:50
1

It's not a direct `startsWith()` function that returns a boolean. – Sridhar Sarnobat Feb 07 '23 at 02:23

T.J. Crowder · Accepted Answer · 2020-03-25T18:13:27.547

91

Apparently there's no standard C function for this. So:

bool startsWith(const char *pre, const char *str)
{
    size_t lenpre = strlen(pre),
           lenstr = strlen(str);
    return lenstr < lenpre ? false : memcmp(pre, str, lenpre) == 0;
}

Note that the above is nice and clear, but if you're doing it in a tight loop or working with very large strings, it does not offer the best performance, as it scans the full length of both strings up front (strlen). Solutions like wj32's or Christoph's may offer better performance (although this comment about vectorization is beyond my ken of C). Also note Fred Foo's solution which avoids strlen on str (he's right, it's unnecessary if you use strncmp instead of memcmp). Only matters for (very) large strings or repeated use in tight loops, but when it matters, it matters.

edited Mar 25 '20 at 18:13

answered Jan 22 '11 at 22:26

T.J. Crowder

1,031,962
187
1,923
1,875

7

I should mention that the *usual* thing would be for the string to be the first parameter, and the prefix to the be second. But I kept them as above because the seemed to be how your question was framed... The order is entirely up to you, but I really should have done it the other way 'round -- most string functions take the full string as the first argument, the substring as the second. – T.J. Crowder Jan 22 '11 at 22:31
2

This is an elegant solution, but it does have some performance issues. An optimized implementation would never look at more than min(strlen(pre), strlen(str)) characters from each string, nor would it ever look beyond the first mismatch. If the strings were long, but early mismatches were common, it would be very lightweight. But since this implementation takes the full length of both strings right up front, it forces worst-case performance, even if the strings differ in the very first character. Whether this matters really depends on the circumstances, but it's a potential problem. – Tom Karzes Jan 06 '18 at 11:33
@TomKarzes: Absolutely, I've gotten spoiled by languages/environments where string length is a known value rather than one we have to go figure out. :-) [wj32's solution](https://stackoverflow.com/a/4771055/157247) offers much better performance. Only matters for (very) large strings or tight loops, but when it matters, it matters. – T.J. Crowder Jan 06 '18 at 11:38
2

@TomKarzes You can substitute `memcmp` for `strncmp` here and it's faster. There's no UB because both strings are known to have at least `lenpre` bytes. `strncmp` checks each byte of both strings for NUL, but the `strlen` calls already guaranteed that there aren't any. (But it still has the performance hit you mentioned, when `pre` or `str` are longer than the actual common initial sequence.) – Jim Balter Aug 01 '19 at 18:43
1

@JimBalter - Very good point! Since using `memcmp` above wouldn't be appropriating from another answer here, I went ahead and changed it in the answer. – T.J. Crowder Aug 02 '19 at 06:46
1

P.S. This (now) may be the fastest answer on some machines with some strings, because `strlen` and `memcmp` can be implemented with very fast hardware instructions, and the `strlen`s may put the strings into the cache, avoiding a double memory hit. On such machines, `strncmp` could be implemented as two `strlen`s and a `memcmp` just like this, but it would be risky for a library writer to do so, as that could take much longer on long strings with short common prefixes. Here that hit is explicit, and the `strlen`s are only done once each (Fred Foo's `strlen` + `strncmp` would do 3). – Jim Balter Aug 02 '19 at 17:06
1

P.P.S. This is even more effective if the function is inlined and the length of one or more argument is already known -- e.g., a constant. Consider checking several different prefixes against the same string -- one `strlen` for the target string, plus a `strlen` (unless constant) and `memcmp` for each prefix (and not even that if the prefix is longer than the target). – Jim Balter Aug 02 '19 at 17:57

Christoph · Answer 3 · 2011-01-22T23:58:24.293

41

I'd probably go with strncmp(), but just for fun a raw implementation:

_Bool starts_with(const char *restrict string, const char *restrict prefix)
{
    while(*prefix)
    {
        if(*prefix++ != *string++)
            return 0;
    }

    return 1;
}

edited Jan 22 '11 at 23:58

answered Jan 22 '11 at 23:45

Christoph

164,997
36
182
240

8

I like this best - there's no reason to scan either of the strings for a length. – Michael Burr Jan 23 '11 at 06:22
1

I would probably go with strlen+strncmp too, but although it does in fact work, all the controversy over it's vague definition is putting me off. So I'll use this, thanks. – Sam Watkins Jan 06 '15 at 03:26
6

This is likely to be slower than `strncmp`, unless your compiler is really good at vectorization, because glibc writers sure are :-) – Ciro Santilli OurBigBook.com Jun 27 '15 at 12:39
3

This version should be faster than the strlen+strncmp version if the prefix doesn't match, especially if there are already differences in the first few characters. – dpi Jul 14 '18 at 00:22
If the string is constant, a good compiler knows its length already so this could again be slower... – Antti Haapala -- Слава Україні Sep 28 '18 at 07:49
1

^That optimization would only apply if the function is inlined. – Jim Balter Aug 02 '19 at 17:48

score 6 · Answer 4 · edited Feb 07 '23 at 02:16

6

Use strstr() function. Stra == strstr(stra, strb)

Reference

The strstr() function finds the first occurrence of string2 in string1. The function ignores the null character (\0) that ends string2 in the matching process.

https://www.ibm.com/docs/en/i/7.4?topic=functions-strstr-locate-substring

edited Feb 07 '23 at 02:16

Sridhar Sarnobat

25,183
12
93
106

answered Jan 22 '11 at 22:30

gscott

93
1

5

that seems to be somewhat backwards way of doing it - you'll go through whole stra even though it should be clear from very short initial segment if strb is a prefix or not. – StasM Jan 22 '11 at 23:02
2

Premature optimization is the root of all evil. I think this is the best solution, if it is not time critical code or long strings. – Frank Buss Nov 11 '18 at 14:26
4

@ilw It's a famous saying by famous computer scientists -- google it. It's often misapplied (as it is here) ... see http://www.joshbarczak.com/blog/?p=580 – Jim Balter Aug 02 '19 at 17:22
I'm with Frank personally. Unix Philosophy: clarity is better than cleverness. – Sridhar Sarnobat Feb 07 '23 at 02:13

score 4 · Answer 5 · answered Jan 22 '11 at 22:30

I'm no expert at writing elegant code, but...

int prefix(const char *pre, const char *str)
{
    char cp;
    char cs;

    if (!*pre)
        return 1;

    while ((cp = *pre++) && (cs = *str++))
    {
        if (cp != cs)
            return 0;
    }

    if (!cs)
        return 0;

    return 1;
}

score 2 · Answer 6 · answered Jul 15 '22 at 22:04

2

I noticed the following function definition in the Linux Kernel. It returns true if str starts with prefix, otherwise it returns false.

/**
* strstarts - does @str start with @prefix?
* @str: string to examine
* @prefix: prefix to look for.
*/
bool strstarts(const char *str, const char *prefix)
{
     return strncmp(str, prefix, strlen(prefix)) == 0;
}

answered Jul 15 '22 at 22:04

Farzam

131
2
13

How is this different from Fred Foo's answer, apart from the order of arguments? – chqrlie Jul 15 '22 at 22:18
1

The obvious difference is that I provided a reference to the code that I did not write. The code was added to the Linux Kernel in 2009 [1], 2 years before Fred Foo's answer was posted. So you should question Fred Foo's answer, not mine. [1]: https://github.com/torvalds/linux/commit/66f92cf9d415e96a5bdd6c64de8dd8418595d2fc – Farzam Jul 15 '22 at 23:16
this solution is rather obvious, Linus was not the first to write it either. Note that *Christoph*'s solution is simpler and probably more efficient, and the accepted solution is clumsy. – chqrlie Jul 15 '22 at 23:24

Zloten · Answer 7 · 2015-02-09T22:33:24.070

1

Optimized (v.2. - corrected):

uint32 startsWith( const void* prefix_, const void* str_ ) {
    uint8 _cp, _cs;
    const uint8* _pr = (uint8*) prefix_;
    const uint8* _str = (uint8*) str_;
    while ( ( _cs = *_str++ ) & ( _cp = *_pr++ ) ) {
        if ( _cp != _cs ) return 0;
    }
    return !_cp;
}

edited Feb 09 '15 at 22:33

answered Nov 05 '14 at 00:58

Zloten

97
2
4

3

voting negative: `startsWith("\2", "\1")` returns 1, `startsWith("\1", "\1")` also returns 1 – thejh Feb 08 '15 at 14:53
This decision will not use optimisations in clang, since not use instrisincs. – socketpair Jun 08 '15 at 14:13
^ intrinsics don't help here, especially if the target string is much longer than the prefix. – Jim Balter Aug 02 '19 at 17:39

score 0 · Answer 8 · answered Oct 26 '14 at 01:13

Because I ran the accepted version and had a problem with a very long str, I had to add in the following logic:

bool longEnough(const char *str, int min_length) {
    int length = 0;
    while (str[length] && length < min_length)
        length++;
    if (length == min_length)
        return true;
    return false;
}

bool startsWith(const char *pre, const char *str) {
    size_t lenpre = strlen(pre);
    return longEnough(str, lenpre) ? strncmp(str, pre, lenpre) == 0 : false;
}

shpc · Answer 9 · 2019-08-14T08:53:26.163

Or a combination of the two approaches:

_Bool starts_with(const char *restrict string, const char *restrict prefix)
{
    char * const restrict prefix_end = prefix + 13;
    while (1)
    {
        if ( 0 == *prefix  )
            return 1;   
        if ( *prefix++ != *string++)
            return 0;
        if ( prefix_end <= prefix  )
            return 0 == strncmp(prefix, string, strlen(prefix));
    }  
}

EDIT: The code below does NOT work because if strncmp returns 0 it is not known if a terminating 0 or the length (block_size) was reached.

An additional idea is to compare block-wise. If the block is not equal compare that block with the original function:

_Bool starts_with_big(const char *restrict string, const char *restrict prefix)
{
    size_t block_size = 64;
    while (1)
    {
        if ( 0 != strncmp( string, prefix, block_size ) )
          return starts_with( string, prefix);
        string += block_size;
        prefix += block_size;
        if ( block_size < 4096 )
          block_size *= 2;
    }
}

The constants 13, 64, 4096, as well as the exponentiation of the block_size are just guesses. It would have to be selected for the used input data and hardware.

These are good ideas. Note though that the first one is technically undefined behavior if the prefix is shorter than 12 bytes (13 including NUL) because the language standard does not define the result of calculating an address outside the string other than the immediately following byte. — Jim Balter, Aug 02 '19 at 17:17
@JimBalter: Could you add a reference? If the pointer is dereferenced and is after the terminating 0 then the deferenced pointer value is undefined. But why should the address itself be undefined? It is just a calculation. — shpc, Aug 12 '19 at 12:24
There was a general bug however: The `block_size` incrementation must be after the pointer incrementation. Now fixed. — shpc, Aug 12 '19 at 12:47

score 0 · Answer 10 · answered Sep 01 '21 at 06:20

0

I use this macro:

#define STARTS_WITH(string_to_check, prefix) (strncmp(string_to_check, prefix, ((sizeof(prefix) / sizeof(prefix[0])) - 1)) ? 0:((sizeof(prefix) / sizeof(prefix[0])) - 1))

It returns the prexif length if the string starts with the prefix. This length is evaluated compile time (sizeof) so there is no runtime overhead.

answered Sep 01 '21 at 06:20

Viktor Varga

49
3

1

(I am quite certain that) optimizers would evaluate strlen for literals anyway so there is very little use for the sizeof. Additionally, hiding a sizeof (that only works properly if the respective object is in scope) within a macro is a very easy way to introduce horrendous bugs. – stefanct Oct 04 '21 at 22:41

How to check if a string starts with another string in C?

10 Answers10

Reference

Linked

Related