66

While learning C I regularly come across resources which recommend that some functions (e.g. gets()) are never to be used, because they are either difficult or impossible to use safely.

If the C standard library contains a number of these "never-use" functions, it would seem necessary to learn a list of them, what makes them unsafe, and what to do instead.

So far, I've learned that functions which:

  • Cannot be prevented from overwriting memory
  • Are not guaranteed to null-terminate a string
  • Maintain internal state between calls

are commonly regarded as being unsafe to use. Is there a list of functions which exhibit these behaviours? Are there other types of functions which are impossible to use safely?

Carson Myers
  • 37,678
  • 39
  • 126
  • 176
  • 2
    Complete list from securecoding : https://www.securecoding.cert.org/confluence/display/seccode/MSC24-C.+Do+not+use+deprecated+or+obsolescent+functions – mpgn Sep 25 '14 at 19:00
  • 5
    Question is being discussed on meta and may attract unusual attention https://meta.stackoverflow.com/questions/385216/what-should-be-done-to-this-old-c-question?cb=1 – Remy May 20 '19 at 11:12
  • There are oddballs like `setjmp()` and `longjmp()` that haven't been mentioned; they're problematic in general, especially for the original intended use of jumping between the current procedure and some other procedure still on the calling stack above the current procedure. They're difficult to use really well. Signal handling can also be problematic. The standard C rules are extremely restrictive; POSIX rules are more sensible. – Jonathan Leffler May 22 '19 at 19:45
  • A lot depends on context. The MISRA C coding guidelines (originally for embedded systems in cars — MISRA is Motor Industry Software Reliability Association) preclude the use of `malloc()` and friends, and outlaw variable length arrays, and flexible array members. There are grounds for doing so in the context for which the guidelines were written. Such proscriptions are not necessarily useful elsewhere. – Jonathan Leffler May 22 '19 at 19:47

9 Answers9

48

In the old days, most of the string functions had no bounds checking. Of course they couldn't just delete the old functions, or modify their signatures to include an upper bound, that would break compatibility. Now, for almost every one of those functions, there is an alternative "n" version. For example:

strcpy -> strncpy
strlen -> strnlen
strcmp -> strncmp
strcat -> strncat
strdup -> strndup
sprintf -> snprintf
wcscpy -> wcsncpy
wcslen -> wcsnlen

And more.

See also https://github.com/leafsr/gcc-poison which is a project to create a header file that causes gcc to report an error if you use an unsafe function.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
Adam Batkin
  • 51,711
  • 9
  • 123
  • 115
  • 33
    strncpy are not what you should be changing to either. Take for example copying "hello" into a char[5], with the size param of 5. This results in the char[5] are not being NULL terminated! Use strlcpy or equiv please! Or, if you must use something like strncpy, ALWAYS set the same param to 1 less then the max buffer size and manually NULL terminate yourself. http://www.usenix.org/events/usenix99/millert.html – Dan McGrath Aug 10 '09 at 04:21
  • 20
    Note that strlen(), strcmp() and strdup() are safe. The 'n' alternatives give you additional functionality. – caf Aug 10 '09 at 04:22
  • 8
    @Dan strlcpy is a BSDism - a standard C alternative is to set dest[0] = '\0'; and then call strncat() - unlike strncpy(), strncat() always nul-terminates the destination. – caf Aug 10 '09 at 04:24
  • 2
    I assume you meant dest[n] (Where n = 4 for char[5]), since setting the the first char to null does not stop the NULL termination issue when copying buffers are of maximum size. This is why I put the "param to 1 less then the max buffer size and manually NULL terminate yourself" – Dan McGrath Aug 10 '09 at 04:29
  • 4
    Nope, I meant dest[0] = '\0'; strncat(dest, source, dest_size - 1); - the idea is to use strncat(), since it has sane destination-terminating behaviour. – caf Aug 10 '09 at 04:32
  • 2
    Slight caveat: strncpy() doesn't just copy up to n chars of a string but also pads out with NUL chars up to the length n if the string is shorter than n. This can be a massive performance hit if you have allocated big buffers "to be safe" but normally handle small strings (e.g. copying file names to buffer of size PATH_MAX, which is typically 4K). [And, yes, I know PATH_MAX is deprecated]. – Dipstick Aug 10 '09 at 05:34
  • 2
    Instead of mucking about with badly designed low-level functions for manipulating arrays of char for string processing, it makes much more sense to use a library for a higher level string abstraction. –  Aug 10 '09 at 10:23
  • 1
    @liw.fi: Absolutely, I agree. But C still doesn't include a standard string library, other than null-terminated `char*` and the above (and other) manipulation functions – Adam Batkin Aug 10 '09 at 18:28
  • 2
    And, if you're on MSVC, *_s variants (e.g. `strcpy_s`/`strncpy_s`). Shame they're not more widely used, since `strncpy` is still open to programmer's slip-ups and thus is not entirely safe to buffer overflows. – Roman Starkov Jul 19 '10 at 16:35
  • Note that it appears that "strnlen()" is a non-standard function (not part of C99). Some implementations may have it, others might not: http://cboard.cprogramming.com/c-programming/101860-strnlen-implicit-declaration.html – Dave Sep 25 '10 at 20:47
35

Yes, fgets(..., ..., STDIN) is a good alternative to gets(), because it takes a size parameter (gets() has in fact been removed from the C standard entirely in C11). Note that fgets() is not exactly a drop-in replacement for gets(), because the former will include the terminating \n character if there was room in the buffer for a complete line to be read.

scanf() is considered problematic in some cases, rather than straight-out "bad", because if the input doesn't conform to the expected format it can be impossible to recover sensibly (it doesn't let you rewind the input and try again). If you can just give up on badly formatted input, it's useable. A "better" alternative here is to use an input function like fgets() or fgetc() to read chunks of input, then scan it with sscanf() or parse it with string handling functions like strchr() and strtol(). Also see below for a specific problem with the "%s" conversion specifier in scanf().

It's not a standard C function, but the BSD and POSIX function mktemp() is generally impossible to use safely, because there is always a TOCTTOU race condition between testing for the existence of the file and subsequently creating it. mkstemp() or tmpfile() are good replacements.

strncpy() is a slightly tricky function, because it doesn't null-terminate the destination if there was no room for it. Despite the apparently generic name, this function was designed for creating a specific style of string that differs from ordinary C strings - strings stored in a known fixed width field where the null terminator is not required if the string fills the field exactly (original UNIX directory entries were of this style). If you don't have such a situation, you probably should avoid this function.

atoi() can be a bad choice in some situations, because you can't tell when there was an error doing the conversion (e.g., if the number exceeded the range of an int). Use strtol() if this matters to you.

strcpy(), strcat() and sprintf() suffer from a similar problem to gets() - they don't allow you to specify the size of the destination buffer. It's still possible, at least in theory, to use them safely - but you are much better off using strncat() and snprintf() instead (you could use strncpy(), but see above). Do note that whereas the n for snprintf() is the size of the destination buffer, the n for strncat() is the maximum number of characters to append and does not include the null terminator. Another alternative, if you have already calculated the relevant string and buffer sizes, is memmove() or memcpy().

On the same theme, if you use the scanf() family of functions, don't use a plain "%s" - specify the size of the destination e.g. "%200s".

caf
  • 233,326
  • 40
  • 323
  • 462
  • 3
    Personally I think this is a better solution to the question as it gives more explanation rather than just a list, and it answered the questions on gets() and scanf(). – Energy Feb 20 '16 at 11:34
  • Note that `fgets()` retains the newline whereas `gets()` does not. That means you need to use an operation such as `if (fgets(buffer, sizeof(buffer), stdin) != NULL) { buffer[strcspn(buffer, "\n")] = '\0'; …use buffer without newline… }`. The `strcspn()` function has always been a part of Standard C; it is safe whether the buffer does or does not contain a newline (and it won't contain a newline if the line was too long to fit in the buffer, or if the file ended without a newline). – Jonathan Leffler May 22 '19 at 19:20
  • Using `strncat()` correctly is hard. For example: `char dst[16] = ""; char *src = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; strncat(dst, src, sizeof(dst));` yields a buffer overflow, even though you used `strncat()` (or rather, because you misused `strncat()`). It writes a null byte after the end of the array. If you know enough to use `strncat()` safely, you know enough to use better alternatives (`memmove()`, `memcpy()`, etc). – Jonathan Leffler May 22 '19 at 19:23
  • I've added a note that `fgets()` is not a drop-in replacement for `gets()`. For `strncat()` I am not sure that I would characterise using `sizeof dst - 1` instead of `sizeof dst` as "hard", though - you simply must read the documentation and understand the function you are using. I do agree that in most cases when you carefully track the lengths of the strings and remaining buffer space involved that you end up finding `memmove()` / `memcpy()` are the most appropriate functions. – caf May 23 '19 at 00:26
20

strtok() is generally considered to be evil because it stores state information between calls. Don't try running THAT in a multithreaded environment!

vvvvv
  • 25,404
  • 19
  • 49
  • 81
Dave Markle
  • 95,573
  • 20
  • 147
  • 170
  • 3
    It's also considered a bit nutty because of the way that it modifies its first argument. – caf Aug 10 '09 at 04:49
  • 2
    Many CRT implementations use thread-local variables to keep this kind of state information, so it might technically be safe depending on the platform, but it's definitely not a good idea. – Tim Sylvester Aug 10 '09 at 05:08
  • 1
    mscv strtok() uses TLS to store state info, so it thread-safe – f0b0s Aug 10 '09 at 10:38
  • 9
    Thread safety isn't the only problem with `strtok()`. If your code using `strtok()` calls any functions while it is parsing with `strtok()`, none of those called functions can use `strtok()` without screwing your function up. Similarly, no library function can use `strtok()` without documenting that it does because any caller that is also using `strtok()` will be screwed up. These are converse propositions, of course. So, you have to be extremely careful if using `strtok()`. It is fine for toy programs where you're in charge of all the code; it is seldom good for production-quality programs. – Jonathan Leffler Dec 04 '13 at 04:26
  • there's a reentrant alternative: `strtok_r` – Jean-François Fabre May 22 '19 at 20:00
11

Strictly speaking, there is one really dangerous function. It is gets() because its input is not under the control of the programmer. All other functions mentioned here are safe in and of themselves. "Good" and "bad" boils down to defensive programming, namely preconditions, postconditions and boilerplate code.

Let's take strcpy() for example. It has some preconditions that the programmer must fulfill before calling the function. Both strings must be valid, non-NULL pointers to zero terminated strings, and the destination must provide enough space with a final string length inside the range of size_t. Additionally, the strings are not allowed to overlap.

That is quite a lot of preconditions, and none of them is checked by strcpy(). The programmer must be sure they are fulfilled, or he must explicitly test them with additional boilerplate code before calling strcpy():

n = DST_BUFFER_SIZE;
if ((dst != NULL) && (src != NULL) && (strlen(dst)+strlen(src)+1 <= n))
{
    strcpy(dst, src);
}

Already silently assuming the non-overlap and zero-terminated strings.

strncpy() does include some of these checks, but it adds another postcondition the programmer must take care for after calling the function, because the result may not be zero-terminated.

strncpy(dst, src, n);
if (n > 0)
{
    dst[n-1] = '\0';
}

Why are these functions considered "bad"? Because they would require additional boilerplate code for each call to really be on the safe side when the programmer assumes wrong about the validity, and programmers tend to forget this code.

Or even argue against it. Take the printf() family. These functions return a status that indicate error and success. Who checks if the output to stdout or stderr succeeded? With the argument that you can't do anything at all when the standard channels are not working. Well, what about rescuing the user data and terminating the program with an error-indicating exit code? Instead of the possible alternative of crash and burn later with corrupted user data.

In a time- and money-limited environment it is always the question of how much safety nets you really want and what is the resulting worst case scenario? If it is a buffer overflow as in case of the str-functions, then it makes sense to forbid them and probably provide wrapper functions with the safety nets already within.

One final question about this: What makes you sure that your "good" alternatives are really good?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Secure
  • 4,268
  • 1
  • 18
  • 16
  • 2
    These functions are bad because they make it easy to write buggy code. Worse, the buggy code is often they type that introduces security vulnerabilities. Sure, they often *can* be used safely and correctly, but it's too easy to use them incorrectly. I know this both from experience and from studies Microsoft has done on millions of lines of code. MS doesn't ban the use of functions 'just because'. They do it because they have hard statistics about the types of code cause bugs (in particular security bugs). This is hard data whether or not Microsoft is a company that you like or respect. – Michael Burr Aug 10 '09 at 14:27
  • 3
    And as far as `strncpy()` is concerned, it's not a 'good' or safe function. It's banned by Microsoft and in my code. – Michael Burr Aug 10 '09 at 14:29
  • Just what I said more verbosely. The "badness" of a function depends on the amount and the type of the conditions the programmer has to ensure, their locality (the destination memory can be defined anywhere in the program) and the severity of the errors that are possible when done wrong. No surprise that string-related functions are on top. And I don't use strncpy() myself, but for the reason that I consider a silent string truncation without any indication that it happened as an error. – Secure Aug 10 '09 at 16:21
  • BTW, reading it again after 3 months, I've confused strcpy and strcat. But nobody cared, anyway. ;) – Secure Nov 02 '09 at 17:18
  • Very nice post +1. I wonder if you are the same Secure who invariably provided (provides?) definitive and reliable C information on the Joel on Software forums ? – Bill Forster Dec 01 '09 at 22:09
  • 1
    Misleading post -1. In the real world, the importance of never calling these deprecated functions is real. Adding all the "boilerplate" is an avenue for errors to enter. The "preconditions" you check for `strcpy()` are mindless; for one thing, why add the source length; for another, checking for NULL is useless and misleading code. – Heath Hunnicutt Jul 06 '10 at 14:13
  • Ah, `strncpy` is designed for a certain old (not quite obsolete) storage format where the trailing null is omitted if the string barely fits without it. To get it back into a memory variable is another `strncpy` call onto another buffer and add the trailing zero yourself. Since the sizes were always constants this was `strncpy(membuf, strptr->diskbuf, bufsiz)[bufsiz]` = 0; In short, `strncpy` does what it was intended to do, which you probably never need to do. – Joshua May 21 '19 at 20:14
  • The `strcpy()` wrapper code is incorrect. It is checking for a hybrid of `strcat()` and `strcpy()`; the length check condition is incorrect for `strcpy()`, but the copying code is incorrect for `strcat()`. – Jonathan Leffler May 22 '19 at 19:34
7

Any function that does not take a maximum length parameter and instead relies on an end-of- marker to be present (such as many 'string' handling functions).

Any method that maintains state between calls.

Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
  • Not always. Strlen relies on the end-of marker yet is still a safe function to use. – Billy ONeal Aug 10 '09 at 04:47
  • 1
    Probably more precise to say any *destructive* function. – Chuck Aug 10 '09 at 05:02
  • @BillyONeal `strlen` may be unsafe if the input didn't come from a function that actually puts that end of string marker. If it didn't, then all bets are off and you will either get extra info, a wrong answer and if you're unlucky enough to not have a NUL byte anywhere after the string until the first unallocated page, a potential crash. – Paul Stelian Apr 14 '18 at 15:32
6
  • sprintf is bad, does not check size, use snprintf
  • gmtime, localtime -- use gmtime_r, localtime_r
Artyom
  • 31,019
  • 21
  • 127
  • 215
4

To add something about strncpy most people here forgot to mention. strncpy can result in performance problems as it clears the buffer to the length given.

char buff[1000];
strncpy(buff, "1", sizeof buff);

will copy 1 char and overwrite 999 bytes with 0

Another reason why I prefer strlcpy (I know strlcpy is a BSDism but it is so easy to implement that there's no excuse to not use it).

Patrick Schlüter
  • 11,394
  • 1
  • 43
  • 48
3

View page 7 (PDF page 9) SAFECode Dev Practices

Edit: From the page -

strcpy family
strncpy family
strcat family
scanf family
sprintf family
gets family

fish2000
  • 4,289
  • 2
  • 37
  • 76
Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
  • 2
    That also refers to Microsoft's Security Development Lifecycle (SDL) Banned Function Calls at http://msdn.microsoft.com/en-us/library/bb288454.aspx – dajobe Aug 10 '09 at 05:35
2

strcpy - again!

Most people agree that strcpy is dangerous, but strncpy is only rarely a useful replacement. It is usually important that you know when you've needed to truncate a string in any case, and for this reason you usually need to examine the length of the source string anwyay. If this is the case, usually memcpy is the better replacement as you know exactly how many characters you want copied.

e.g. truncation is error:

n = strlen( src );

if( n >= buflen )
    return ERROR;

memcpy( dst, src, n + 1 );

truncation allowed, but number of characters must be returned so caller knows:

n = strlen( src );

if( n >= buflen )
    n = buflen - 1;

memcpy( dst, src, n );
dst[n] = '\0';

return n;
CB Bailey
  • 755,051
  • 104
  • 632
  • 656