4

Functions strdup() and strndup() have finally made it into the upcoming C23 Standard:

7.24.6.4 The strdup function

Synopsis

#include <string.h>
char *strdup(const char *s);

The strdup function creates a copy of the string pointed to by s in a space allocated as if by a call to malloc.

Returns
The strdup function returns a pointer to the first character of the duplicate string. The returned pointer can be passed to free. If no space can be allocated the strdup function returns a null pointer.

7.24.6.5 The strndup function

Synopsis

#include <string.h>
char *strndup(const char *s, size_t size);

The strndup function creates a string initialized with no more than size initial characters of the array pointed to by s and up to the first null character, whichever comes first, in a space allocated as if by a call to malloc. If the array pointed to by s does not contain a null within the first size characters, a null is appended to the copy of the array.

Returns
The strndup function returns a pointer to the first character of the created string. The returned pointer can be passed to free. If no space can be allocated the strndup function returns a null pointer.

Why was the POSIX-2008 function strnlen not considered for inclusion?

#include <string.h>
size_t strnlen(const char *s, size_t maxlen);

The strnlen() function shall compute the smaller of the number of bytes in the array to which s points, not including the terminating NUL character, or the value of the maxlen argument. The strnlen() function shall never examine more than maxlen bytes of the array pointed to by s.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Hmm, so `strndup()` allocates up to `size + 1` bytes (I expected `size` or call it `maxlen`) and `strnlen()` returns up to `size`? Such `n` functions are useful yet a little fuzzy on the extremes cases. – chux - Reinstate Monica May 26 '22 at 14:18
  • 2
    @chux-ReinstateMonica: `strndup(s, n)` allocates at least `strnlen(s, n) + 1` bytes. This is consistent with `strncat(s1, s2, n)` copying `strnlen(s2, n)` bytes from `s2` and adding a null terminator. The *fuzzy* one is `strncpy` of course. – chqrlie May 26 '22 at 15:18

2 Answers2

5

Interesingly, this function was proposed in https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2351.htm

It was discussed at the London meeting in 2019. See the agenda: https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2370.htm

The discussion minutes can be found at https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2377.pdf. Page 59.

It was rejected due to no consensus.

6.33 Sebor, Add strnlen to C2X [N 2351]

...

*Straw poll: Should N2351 be put into C2X?

(11/6/6)

Not clear consensus.

As result the function was not added.

tstanisl
  • 13,520
  • 2
  • 25
  • 40
  • Considering the gazillion new floating point names added, namespace pollution cannot be invoked, especially since `strnlen` is a reserved name anyway. – chqrlie May 26 '22 at 15:19
  • @chqrlie Re: "no consensus": FYI: the "Annex K: Fix or deprecate" ended as "[Deprecate/Fix/Abstain: 6/6/5](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2307.pdf)". So, "no consensus, keep it untouched". – pmor Aug 02 '22 at 00:08
  • @pmor: the committee is so conservative on some aspects. Annex K cannot be *fixed*: the Microsoft semantics are inconsistent and cumbersome, yet the only implementation with significant usage is theirs, and they are unlikely to change those semantics... so removing Annex K and making these functions a proprietary extension is the only choice IMHO. – chqrlie Aug 02 '22 at 08:10
  • @chqrlie Re: "Annex K cannot be fixed": 6 people don't think so. I personally found many issues while implementing some of the Annex K functions. Then started reading about "why Annex K is flawed / unfinished". Found defect reports, Red Hat [proposal](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm) to remove it, and finally the "6/6/5". – pmor Aug 04 '22 at 01:39
1

One argument against strnlen is that it's a superfluous function, since we already have memchr. Example:

const char str[666] = "hello world";
size_t length1 = strnlen(str,666);
size_t length2 = (char*)memchr(str,'\0',666) - str;

Advantages of memchr:

  • Already been a standard C function since the dawn of time.
  • Possibly more efficient than strnlen in some situations(?).
  • More generic API.
  • memchr already ought to be in use for the purpose of sanitising supposed string input before calling functions like strcpy, so what purpose strnlen fills is unclear.
  • Has proper error handling, unlike strnlen which does not tell if it failed or not.

Disadvantages:

  • More awkward and type-unsafe interface for the purpose of finding a string length specifically.
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • `length2 = (char*)memchr(str,'\0',666) - str;` would have undefined behavior if `str` does not have a null terminator at all, whereas `strnlen()` can be passed a non null terminated array as long as the size argument is at most the length of the array. – chqrlie Jun 01 '22 at 13:10
  • @chqrlie Yeah well, my example doesn't have error handling for that. But now that you mention it, that's actually yet another strong argument for _not_ using `strnlen` - it doesn't report errors. Illustration: https://godbolt.org/z/dhfE6P38v – Lundin Jun 01 '22 at 13:28
  • passing an array without a null terminator in the first `n` bytes is not an error. Bytes beyond the first `n` should not be examined. The argument does not have to be a null terminated string. The same semantics are used for `strncat`: *The `strncat` function appends not more than `n` characters (a null character and characters that follow it are not appended) from the array pointed to by `s2` to the end of the string pointed to by `s1`.* – chqrlie Jun 01 '22 at 13:58
  • @chqrlie Yes but my point is that if you use `memchr` you get error handling for free. Whereas `strnlen` cannot be used for input sanitation. And in case you do know that the input string is proper, you might as well use `strlen` since it's faster than `strnlen`. – Lundin Jun 01 '22 at 14:06
  • It looks like `strnlen` can be used for input sanitation. A string sitting in a buffer of size `N` cannot possibly have length of `N`. If `strnlen` returns `N`, you know the buffer contains something that is not a string. – n. m. could be an AI Jun 01 '22 at 14:11
  • @n.1.8e9-where's-my-sharem. Yeah that's true. But either way, I see no advantage of using `strnlen` over `memchr`. (Also micro-optimization, a check vs zero might be a tiny bit faster than comparing two integers for equivalence.) – Lundin Jun 01 '22 at 14:50
  • 2
    @Lundin: pushing your argument, `strlen` is useless since you can use `memchr(p, '\0', -1) - p`, or better `strchr(p, '\0') - p`, `strcpy` can be replaced with `memcpy(dest, src, strlen(src) + 1)`... – chqrlie Jun 01 '22 at 18:09
  • @chqrlie Those examples are a bit ridiculous, but truth is that many C programmers are brainwashed to always use certain functions for certain tasks. Consider `str1 = malloc(strlen(str2)+1); strcpy(str1, str2);` This code is needlessly slow, because the programmer can't think outside the box. If we know the string length, then why use slow `strcpy`? It could be optimized as `size_t n = strlen(str2)+1; str1 = malloc(n); memcpy(str1, str2, n);` Just as readable, only faster. – Lundin Jun 02 '22 at 06:29
  • @Lundin: Good example! Of course `str1 = strdup(str2);` is even more readable, just as fast, less error prone, fully defined and usable directly as a function argument. Too bad it took so long for the C Standard to include simple POSIX utility functions. – chqrlie Jun 02 '22 at 07:00
  • @chqrlie Yep, a correctly implemented strdup ought to boil down to inlining of strlen+malloc+memcpy. – Lundin Jun 02 '22 at 09:03