57

This is inspired by this question and the comments on one particular answer in that I learnt that strncpy is not a very safe string handling function in C and that it pads zeros, until it reaches n, something I was unaware of.

Specifically, to quote R..

strncpy does not null-terminate, and does null-pad the whole remainder of the destination buffer, which is a huge waste of time. You can work around the former by adding your own null padding, but not the latter. It was never intended for use as a "safe string handling" function, but for working with fixed-size fields in Unix directory tables and database files. snprintf(dest, n, "%s", src) is the only correct "safe strcpy" in standard C, but it's likely to be a lot slower. By the way, truncation in itself can be a major bug and in some cases might lead to privilege elevation or DoS, so throwing "safe" string functions that truncate their output at a problem is not a way to make it "safe" or "secure". Instead, you should ensure that the destination buffer is the right size and simply use strcpy (or better yet, memcpy if you already know the source string length).

And from Jonathan Leffler

Note that strncat() is even more confusing in its interface than strncpy() - what exactly is that length argument, again? It isn't what you'd expect based on what you supply strncpy() etc - so it is more error prone even than strncpy(). For copying strings around, I'm increasingly of the opinion that there is a strong argument that you only need memmove() because you always know all the sizes ahead of time and make sure there's enough space ahead of time. Use memmove() in preference to any of strcpy(), strcat(), strncpy(), strncat(), memcpy().

So, I'm clearly a little rusty on the C standard library. Therefore, I'd like to pose the question:

What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies?

In the interests of objectivity, I have a number of criteria for an answer:

  • Please, if you can, cite design reasons behind the function in question i.e. its intended purpose.
  • Please highlight the misuse to which the code is currently put.
  • Please state why that misuse may lead towards a problem. I know that should be obvious but it prevents soft answers.

Please avoid:

  • Debates over naming conventions of functions (except where this unequivocably causes confusion).
  • "I prefer x over y" - preference is ok, we all have them but I'm interested in actual unexpected side effects and how to guard against them.

As this is likely to be considered subjective and has no definite answer I'm flagging for community wiki straight away.

I am also working as per C99.

Community
  • 1
  • 1
  • Any function can be used inappropriately and in ways that can lead to security holes. – Falmarri Jan 03 '11 at 21:36
  • 4
    @Falmarri - but some are frequently used inappropriately where others aren't, some seem to encourage misuse where others don't. –  Jan 03 '11 at 21:41

14 Answers14

34

What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies ?

I'm gonna go with the obvious :

char *gets(char *s);

With its remarkable particularity that it's simply impossible to use it appropriately.

icecrime
  • 74,451
  • 13
  • 99
  • 111
  • 6
    MacOS X actually prints out a runtime warning when you use it. – onemasse Jan 03 '11 at 21:53
  • Now that's one function I can honestly say I've never misused, because I've never actually used it. –  Jan 03 '11 at 21:53
  • 9
    `gets()`: the absolute zero of software security. – j_random_hacker Jan 03 '11 at 21:58
  • 8
    Note that C0x will remove `gets()` from the standard. Unfortunately, it will be another 10-20 years after that is finalized before it is removed from most implementations - backwards compatibility with insecurity dictates that. – Jonathan Leffler Jan 03 '11 at 23:23
  • 2
    @onemasse: does it really? I hadn't noticed (but then, I don't use it, even in throwaway code!). Much better that it warns about that than about `mktemp()`, which I do see periodically in some of the code I work on. – Jonathan Leffler Jan 03 '11 at 23:24
  • @Jonathan and MSVC won't support it... especially if it doesn't support C99. –  Jan 04 '11 at 01:21
  • 2
    MSVC might. Their deal with the committee is that they'll support the new standard if the committee adds all their hideous `*_s` "secure" functions to the standard to force *nix implementations to pollute themselves with it. ;-) – R.. GitHub STOP HELPING ICE Jan 04 '11 at 02:45
  • @Jonathan: I don't think it'll take 20 years. I expect most *nix implementations, at least, will be pretty quick to guard it in the header with `#if defined IM_A_MORON_LET_ME_SHOOT_MYSELF_IN_THE_FOOT`. – Stephen Canon Jan 04 '11 at 19:12
  • @JonathanLeffler: The 2011 ISO C standard has indeed removed `gets()` from the standard library. – Keith Thompson Feb 12 '14 at 23:30
  • @KeithThompson: Yes! Now to get `gets()` removed from the system libraries everywhere, or replaced in the system library with `char *gets(char *str) { abort(); }` with a secondary library, `-lgets`, that has to be added to the link line to get the old style insecure `gets()` function. A linker warning would be good (like for `mktemp()`) -- if that doesn't already occur. An unconditional compiler warning would be nice, too. – Jonathan Leffler Feb 13 '14 at 00:53
25

A common pitfall with the strtok() function is to assume that the parsed string is left unchanged, while it actually replaces the separator character with '\0'.

Also, strtok() is used by making subsequent calls to it, until the entire string is tokenized. Some library implementations store strtok()'s internal status in a global variable, which may induce some nasty suprises, if strtok() is called from multiple threads at the same time.

The CERT C Secure Coding Standard lists many of these pitfalls you asked about.

Calimo
  • 7,510
  • 4
  • 39
  • 61
makes
  • 6,438
  • 3
  • 40
  • 58
  • +1 For mirroring my thoughts on strtok() and for mentioning the CERT C Secure Coding Standard. – Jonathan Leffler Jan 03 '11 at 22:28
  • +1, that's a great link, also @Jonathan sorry to quote you but you guys made me think "I definitely need to understand what's going on much more clearly". Hope you don't mind being famous! –  Jan 03 '11 at 22:30
  • 1
    Technically, it is the library function rather than the compiler that stores the state. The big problem is if you isolate a token in your string, and then call a function which, unbeknownst to you, itself calls `strtok()`. – Jonathan Leffler Jan 03 '11 at 22:34
  • @Ninefingers: I'll survive my 15 seconds of infamy :D – Jonathan Leffler Jan 03 '11 at 22:35
  • 1
    `strtok` is *required* to keep its internal status globally even with threads, at least in a POSIX environment where threads are specified. This is because a conforming program could start parsing in one thread and finish in another. Of course MS has their own version of threads where they can specify the different (thread-local) behavior like they do, but it conflicts with POSIX. – R.. GitHub STOP HELPING ICE Jan 04 '11 at 02:47
  • 1
    This is now community wiki which is good, but it still looks like I have to accept an answer, so I'm accepting this one for the CERT C Secure Coding Standard, which provides oodles of useful information. –  Jan 04 '11 at 22:42
  • 1
    I'm puzzled as to why no-one has mentioned `strtok_r` as being (slightly) less confusing in that it doesn't keep global state. – abligh Jan 11 '15 at 21:46
21

In almost all cases, atoi() should not be used (this also applies to atof(), atol() and atoll()).

This is because these functions do not detect out-of-range errors at all - the standard simply says "If the value of the result cannot be represented, the behavior is undefined.". So the only time they can be safely used is if you can prove that the input will certainly be within range (for example, if you pass a string of length 4 or less to atoi(), it cannot be out of range).

Instead, use one of the strtol() family of functions.

caf
  • 233,326
  • 40
  • 323
  • 462
  • +1 for pointing out the (mostly theoretical, but still) danger of `atoi` and UB. – R.. GitHub STOP HELPING ICE Jan 04 '11 at 03:10
  • Excellent point. There is no reason to use `ato*`. – Stephen Canon Jan 04 '11 at 19:09
  • 2
    It's actually pretty handy if you know what platform your code will run on, which, chances are, you do. E.g. MSVC says *The return value is 0 for `atoi` and `_wtoi`, if the input cannot be converted to a value of that type.*, so it's pretty well-defiend. (Also, this is another example where "undefined" and "implementation-defined" actually aren't exactly different -- they can both be defined by the implementation.) – user541686 Nov 12 '11 at 05:09
10

Let us extend the question to interfaces in a broader sense.

errno:

technically it is not even clear what it is, a variable, a macro, an implicit function call? In practice on modern systems it is mostly a macro that transforms into a function call to have a thread specific error state. It is evil:

  • because it may cause overhead for the caller to access the value, to check the "error" (which might just be an exceptional event)
  • because it even imposes at some places that the caller clears this "variable" before making a library call
  • because it implements a simple error return by setting a global state, of the library.

The forthcoming standard gets the definition of errno a bit more straight, but these uglinesses remain

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • 1
    While it's a bit ugly, there's very little that's error-prone or dangerous about `errno`. It's a macro which evaluates to a modifiable lvalue of type `int`, which is plenty well-defined. As far as I can tell, this means you can take and save its address and access the current value through that address if you like. The only "bad practices" I can think of that `errno` might encourage are (1) modelling your own libraries error reporting on it, and (2) using `&errno` as a cheap universally-portable thread-id. :-) – R.. GitHub STOP HELPING ICE Jan 04 '11 at 03:00
  • 2
    Quoth the standard: "The macro... `errno` which exapands to a modifiable lvalue that has type `int`...". So it is clear that it is a macro. – Raedwald Jan 05 '11 at 13:38
  • @Raedwald: yes, but it is not clear how the lvalue is obtained. Nowadays it is usually a function call, I think. – Jens Gustedt Jan 05 '11 at 14:32
  • Surely `errno` is a macro, rather than an `extern int`, precisely to give the implementation that flexibility? Why is it a problem what it is? – Raedwald Jan 05 '11 at 14:43
  • Indeed, I don't see any problem with it being a modifiable lvalue whose definition is up to the implementation. – R.. GitHub STOP HELPING ICE Jan 05 '11 at 16:40
  • @R..: That is a very evil portable thread-id, would it work for compilers/platforms without TLS support? – Matt Joiner Jan 06 '11 at 10:20
  • @Jens Gustedt: Can you expand on the alleged changes in errno definition in forthcoming standard? I'm curious. – Matt Joiner Jan 06 '11 at 10:21
  • 1
    @Matt: it states explicitly that it has thread local storage duration. This is possible, there, since the new standard will have a thread model, quite close to POSIX BTW. – Jens Gustedt Jan 06 '11 at 11:16
  • @Raedwald: problem is perhaps said too much, but the overhead that a simple `errno = 0;` produces is very difficult to estimate for an application. – Jens Gustedt Jan 06 '11 at 11:19
  • 1
    @Matt: If it's a modifiable lvalue of type `int`, taking the address of it is valid, and it can't be the same as another thread's `errno` address. This does not depend on compiler-level TLS. For example `&(*__errno_location())` is the same as `__errno_location()`. If you're writing your own locking code using atomic primitives (C1x, gcc builtins, or asm), `&errno` seems like the safest "owner id" you can get without pulling in dependency on a specific threads implementation (pthreads, solaris, windows, etc.). I agree it's a bit evil though... – R.. GitHub STOP HELPING ICE Jan 06 '11 at 16:17
  • 1
    @R, @Matt: the standard doesn't impose that the lvalue is the same between two subsequent uses of the macro by the same thread, I think. Although I have to admit that it sounds a bit insane to assume differently, but you could imagine that the library in addition of the thread-id keeps track of some other state of the thread and re-assigns a new address here and then. – Jens Gustedt Jan 06 '11 at 17:53
6

There is often a strtok_r.

For realloc, if you need to use the old pointer, it's not that hard to use another variable. If your program fails with an allocation error, then cleaning up the old pointer is often not really necessary.

davep
  • 286
  • 1
  • 4
  • 4
    I was going to say that this should be a comment, not an answer, but you can't comment without rep, so here, have some. – Stephen Canon Jan 03 '11 at 23:09
  • 3
    At the point when you say "often there is `strtok_r()`", you run into "occasionally there isn't" and "what are you going to do when it is not available?". The secondary issue is the assumed platform - the question talks about C99, where `strtok_r()` is not available (nor is `strtok_s()` in general - from TR 24731-1). – Jonathan Leffler Jan 04 '11 at 00:02
4

I would put printf and scanf pretty high up on this list. The fact that you have to get the formatting specifiers exactly correct makes these functions tricky to use and extremely easy to get wrong. It's also very hard to avoid buffer overruns when reading data out. Moreover, the "printf format string vulnerability" has probably caused countless security holes when well-intentioned programmers specify client-specified strings as the first argument to printf, only to find the stack smashed and security compromised many years down the line.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • 7
    if your compiler is not able to tell you "you have used %x with an int number", ditch it, or turn its warning flags on. – BatchyX Jan 03 '11 at 21:44
  • 7
    I disagree. It's only when the format string is computed at runtime instead of a constant string that they become dangerous. GCC even has a nice warning option `-Wformat-nonliteral` for that case (which of course should be combined with `-Werror` to make warnings errors). – Adam Rosenfield Jan 03 '11 at 21:48
  • 2
    while you're at it, just enable -Wall, which also enable -Wformat. – BatchyX Jan 03 '11 at 21:57
  • @Adam: It's only *insecure* if you use runtime-computed format strings (or the `%n` format specifier), but it's still easy to get them wrong (although compiler warnings do help). – j_random_hacker Jan 03 '11 at 22:03
  • 1
    using the %n format specifier is perfectly fine when used correctly. This is even needed in some cases (that's why it's there). – BatchyX Jan 03 '11 at 22:14
  • 3
    The sprintf() function may also lead to buffer overruns if variables are output as something bigger than expected. For example, code might expect that an "unsigned long" will take at most eleven bytes (ten digits and a null terminator) but crash on 64-bit systems where an unsigned long might take 21. – supercat Jan 03 '11 at 22:29
  • The `printf` family (especially `snprintf`) is actually the best, most secure way to construct strings in the C standard library if you're remotely competent at C. I don't think these functions are at all to blame for idiots passing non-format strings in place of the format string argument... – R.. GitHub STOP HELPING ICE Jan 04 '11 at 03:04
  • @R.: I don't think one would have had to be an idiot in the 1990's to e.g. sprintf a %lu into a 16-byte buffer without checking the numerical value of the data in question. IMHO, what's too bad is that there's no standard vgprintf which would accept (in addition to vprintf arguments) a void* and a pointer to a function that takes a void * and a char; such a function could be used to synthesize any of the printf or vprintf variants, along with a bounds-limited sprintf, line-wrapped console printf, or any other desired printf-ish function. – supercat Jan 04 '11 at 15:45
  • @supercat: I would say it was always stupid to use a constant independent of the appropriate `sizeof` expression as the buffer size. As for your theoretical `vgprintf`, it would be nice, but the problem is what level to put it at. I'm sure a lot of people would rather have `FILE` objects with user-provided callbacks, and use `vfprintf` with them. Of course this would be harder to use in the simplest cases, and might impose unwanted restrictions on the possible implementations of `stdio`. – R.. GitHub STOP HELPING ICE Jan 05 '11 at 16:33
  • One more thing to think about regarding `vgprintf`... would the callbacks be required to accept data in whatever unit the implementation sends it in, or would they expect whole fields at a time? In the latter case, this requires dynamic allocation in the implementation and thus has out-of-memory failure conditions. In the former, `vgprintf` has `O(1)` space requirements (although possibly up to 8k or so if exact floating point output is required). – R.. GitHub STOP HELPING ICE Jan 05 '11 at 16:35
  • @R.: How would one use sizeof() to compute the size of a string necessary to accommodate a decimal-printed number? Simply figure (CHAR_BITS * sizeof(unsigned long))>>6+2 or something? As for vgprintf, it would accept a void*, which would be passed to the output function. For fprintf, it would be a FILE*; for sprintf, it would be a char**; for snprintf, it could be a pointer to a local struct with a char*, length so far, and maximum length. No need for dynamic allocation. Whoever calls vgprintf would be responsible for ensuring the passed in pointer was suitable for the passed-in function. – supercat Jan 05 '11 at 16:44
  • @R.: BTW, I think vgprintf is a good way to explain the value of delegates in object-oriented languages. In C, it's necessary to pass separately a function pointer and some data, and manually ensure that functions are only paired with the types of data they expect. Delegates allow a function pointer to be bundled with a piece of data, which will be compile-time validated to ensure it's the proper type for the function. – supercat Jan 05 '11 at 16:49
  • @supercat: I always use `3*sizeof(type)+2` because I don't write text processing code except on POSIX and POSIX-like systems where `CHAR_BIT` is required to be 8. But you can bring in `CHAR_BIT` if you like. – R.. GitHub STOP HELPING ICE Jan 06 '11 at 16:29
4

Any of the functions that manipulate global state, like gmtime() or localtime(). These functions simply can't be used safely in multiple threads.

EDIT: rand() is in the same category it would seem. At least there are no guarantees of thread-safety, and on my Linux system the man page warns that it is non-reentrant and non-threadsafe.

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
  • 1
    As far as I know, the only conformant way to make `rand` thread-safe would be to synchronize it with a mutex, which would hurt performance quite a bit. For a given seed, it's supposed to always return the same sequence of pseudo-random numbers, so using a thread-local state could break this semantic in conformant applications which use their own mutex around calls to `rand`. – R.. GitHub STOP HELPING ICE Jan 05 '11 at 16:39
  • ... or which initially use `srand` and `rand` only in the main thread, then after initialization continue to use it in a newly created thread while never again using it in the main thread. – R.. GitHub STOP HELPING ICE Jan 06 '11 at 16:30
4

One of my bêtes noire is strtok(), because it is non-reentrant and because it hacks the string it is processing into pieces, inserting NUL at the end of each token it isolates. The problems with this are legion; it is distressingly often touted as a solution to a problem, but is as often a problem itself. Not always - it can be used safely. But only if you are careful. The same is true of most functions, with the notable exception of gets() which cannot be used safely.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • It's worth pointing out that `strtok()` was probably added because the pattern (`strchr()` or `strpbrk()` to look for a delimiter; overwrite delimiter with `'\0'`; loop until no more delimiters) is so common. – caf Jan 03 '11 at 23:35
  • @caf: that works if you don't need to know what the delimiter was, but not when you do need to know the delimiter. See the question linked in my answer - and the pathetic excuses for apologia from those advocating `strtok()`. It isn't often I use downvotes; there are two answers there with downvotes from me! – Jonathan Leffler Jan 03 '11 at 23:39
  • Well, I tend to think that `strtok()` is a little unfairly maligned, even if some of the criticisms are fair. Perhaps because I've found more than one occasion when it *was* exactly what I wanted - as long as you stay within its intended domain (parsing simple strings like `PATH` variables) rather than trying to parse complex documents with it, I don't think it's too bad. – caf Jan 03 '11 at 23:53
  • 1
    @caf: The problem is that, as soon as somebody wants to take your code and use it in a library setting rather than in `main()`, they run into a nasty surprise and have to rip out `strtok` and replace it with a sane alternative. – R.. GitHub STOP HELPING ICE Jan 04 '11 at 03:08
  • @R.: Well, yes - all of the above should be taken modulo the usual caveats that apply to all non-reentrant functions. – caf Jan 04 '11 at 03:56
4

There's already one answer about realloc, but I have a different take on it. A lot of time, I've seen people write realloc when they mean free; malloc - in other words, when they have a buffer full of trash that needs to change size before storing new data. This of course leads to potentially-large, cache-thrashing memcpy of trash that's about to be overwritten.

If used correctly with growing data (in a way that avoids worst-case O(n^2) performance for growing an object to size n, i.e. growing the buffer geometrically instead of linearly when you run out of space), realloc has doubtful benefit over simply doing your own new malloc, memcpy, and free cycle. The only way realloc can ever avoid doing this internally is when you're working with a single object at the top of the heap.

If you like to zero-fill new objects with calloc, it's easy to forget that realloc won't zero-fill the new part.

And finally, one more common use of realloc is to allocate more than you need, then resize the allocated object down to just the required size. But this can actually be harmful (additional allocation and memcpy) on implementations that strictly segregate chunks by size, and in other cases might increase fragmentation (by splitting off part of a large free chunk to store a new small object, instead of using an existing small free chunk).

I'm not sure if I'd say realloc encourages bad practice, but it's a function I'd watch out for.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • My guess is that the object being `realloc()`-ed *can* be extended in-place often enough to make preferring it over `free(); malloc();` worthwhile. And your point about `realloc()`ing down to a smaller size causing fragmentation is weak I think -- yes, it could cause fragmentation, namely the exact amount of fragmentation that would have been caused if we had known the correct size to ask for at the time of the original `malloc()` call. – j_random_hacker Jan 04 '11 at 09:10
  • 1
    No. In the worst case, over-allocate and realloc-down provides fragmentation as bad as if you'd never performed the realloc-down. It will never be as good as allocating the right amount to begin with unless the right amount could only have been obtained by splitting off from a larger free chunk. As for extending in-place, if you're talking about an object that grows over time (like a buffer reading in a long file), you can only grow it geometrically or you'll risk `O(n^2)` copying time. When growing geometrically, extending in-place is *almost never* possible. – R.. GitHub STOP HELPING ICE Jan 04 '11 at 13:26
  • As an example of the fragmentation, suppose you have a program that allocates 100k chunks and reallocates them down to ~1k, and makes no other allocation operations. After `heap_size/100k` (=20000 on many 32-bit systems) allocations, the next will fail, despite only 1% of the heap being "in-use". An allocator can avoid this issue by always moving chunks when they're resized down by a large factor, at the expense of some performance... – R.. GitHub STOP HELPING ICE Jan 04 '11 at 13:29
  • See what you mean about fragmentation. But your scenario is unlikely: usually a realloc-down happens soon after the orginal allocation, before other allocations happen. Also I don't see another, better way to approach the problem when you don't know the necessary size -- the only possibly sane alternative being to try exponentially larger guesses until one fits, but (a) usually that's more trouble than it's worth, (b) it requires O(log n) allocations and (c) it relies on being able to reacquire the data you're trying to store multiple times (impossible if, say, you're reading from a pipe). – j_random_hacker Jan 04 '11 at 15:37
  • Also not sure why you think growing geometrically and in-place is "almost never" possible. I don't have stats (nor I suspect do you), but I expect a decent proportion of reallocations act on the most recently (re)allocated block, which is likely to be so extendable. I think the strongest thing you could say against `realloc()` here is that this likelihood of in-place extension of the most recently allocated block reduces the chance of actually getting O(n^2) behaviour from a poorly-thought-out (linearly-growing) growth scheme, thereby encouraging this bad practice. – j_random_hacker Jan 04 '11 at 15:47
  • A `malloc` implementation that aims to avoid fragmentation will aim satisfy all allocations using a free chunk that's as close as possible to the requested size. In dlmalloc-like implementations with a logarithmic free bin scale, the chunk used to satisfy the allocation will never be more than a small factor (1.5x, I believe) larger than the request, unless no free chunks that small are available. Sure it's possible, but I think it's pretty unlikely to have non-top-of-heap large free chunks but no small free chunks. – R.. GitHub STOP HELPING ICE Jan 05 '11 at 15:34
  • As for what to do when you don't know the size that's needed, my favorite approach is almost always to figure it out, even if that means running your computation once and throwing the results away, then running it again. (For instance, first calling `snprintf` with a zero size.) If you really want to do the over-allocate-and-shrink approach, you can simply call `malloc`, `memcpy`, and `free` yourself, and fallback to trying `realloc` if `malloc` fails. This is safe against inducing fragmentation. – R.. GitHub STOP HELPING ICE Jan 05 '11 at 15:51
  • Good conversation :) "it's pretty unlikely to have non-top-of-heap large free chunks but no small free chunks" -- I agree. I'd say you're much more likely to have a stack-like sequence of allocations and deallocations that leave no (or very few) gaps at all, so that there is a high probability that any given `malloc()` call will be allocating from the end of currently allocated memory and so can be extended in-place with an immediately subsequent `realloc()`. Still true even if size binning is used -- it's not the case that certain memory ranges "belong" just to certain allocation sizes. – j_random_hacker Jan 06 '11 at 07:01
  • ... or if it is then the system is inherently memory-wasteful. Re overallocate-and-shrink, I don't understand how calling `malloc()`, `memcpy()` and `free()` myself is less prone to fragmentation than calling `realloc()`, since AFAICT that's exactly what `realloc()` would do itself if it's unable to extend in-place. Could you explain? – j_random_hacker Jan 06 '11 at 07:05
  • Actually, whether or not a stack-like sequence of allocations and deallocations leaves no gaps will depend on when/how free blocks are coalesced so I'll concede that one (with the proviso that this is definitely a common pattern, so an allocation system that created much fragmentation under it would be a poor system). But I'd like to know about the overallocate-and-shrink scenario. – j_random_hacker Jan 06 '11 at 07:10
  • Suppose you have (aside from top-of-heap which we'll ignore for simplicity) just two free chunks A and B of sizes 1k and 10k, respectively, and you want to overallocate 5k and resize it down to 1k. The allocation splits B in half, and after resizing down, you're left with chunks of sizes 1k and 9k. If you'd allocated just 1k to begin with, you'd have it all in one free chunk of size 10k. I call that less fragmentation. The same would apply if you performed the `malloc`/`memcpy`/`free` sequence yourself: you'd end up with a 10k chunk free. – R.. GitHub STOP HELPING ICE Jan 06 '11 at 16:25
  • I see, thanks. I would say you're more likely to *increase* fragmentation with this strategy however, as if there is just a single free chunk (namely the top-of-heap, which would likely be the case if only stack-like allocation/deallocation has occurred so far) `malloc()`+`memcpy()`+`free()` necessarily creates a hole (in your example, of size 5Kb) while in-place shrinking doesn't. – j_random_hacker Jan 06 '11 at 21:42
  • Indeed, memory allocation is a **very hard** problem and no strategy can ever be optimal for all cases and usage patterns. I think it's pretty reasonable to assume most programs will typically have a number of free chunks of various sizes in play most of the time, though. – R.. GitHub STOP HELPING ICE Jan 06 '11 at 22:09
  • @R..: If there are chunks of 1K and 10K, allocating 5K and shrinking to 1K won't be as good as allocating 1K to start with, but allocating 5K and shrinking to 1.1K would leave chunks of 1K and 8.9K, versus 1K, 5K and 3.8K. What would have been better yet would have been if the standard library had defined some routines to use handles, since those are the real key to recovering from fragmentation. – supercat Jul 25 '15 at 21:31
4

How about the malloc family in general? The vast majority of large, long-lived programs I've seen use dynamic memory allocation all over the place as if it were free. Of course real-time developers know this is a myth, and careless use of dynamic allocation can lead to catastrophic blow-up of memory usage and/or fragmentation of address space to the point of memory exhaustion.

In some higher-level languages without machine-level pointers, dynamic allocation is not so bad because the implementation can move objects and defragment memory during the program's lifetime, as long as it can keep references to these objects up-to-date. A non-conventional C implementation could do this too, but working out the details is non-trivial and it would incur a very significant cost in all pointer dereferences and make pointers rather large, so for practical purposes, it's not possible in C.

My suspicion is that the correct solution is usually for long-lived programs to perform their small routine allocations as usual with malloc, but to keep large, long-lived data structures in a form where they can be reconstructed and replaced periodically to fight fragmentation, or as large malloc blocks containing a number of structures that make up a single large unit of data in the application (like a whole web page presentation in a browser), or on-disk with a fixed-size in-memory cache or memory-mapped files.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
2

On a wholly different tack, I've never really understood the benefits of atan() when there is atan2(). The difference is that atan2() takes two arguments, and returns an angle anywhere in the range -π..+π. Further, it avoids divide by zero errors and loss of precision errors (dividing a very small number by a very large number, or vice versa). By contrast, the atan() function only returns a value in the range -π/2..+π/2, and you have to do the division beforehand (I don't recall a scenario where atan() could be used without there being a division, short of simply generating a table of arctangents). Providing 1.0 as the divisor for atan2() when given a simple value is not pushing the limits.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    `atan( )` is often used when doing certain trig operations (but you're right that there's always an implicit `1` hiding somewhere, and it wouldn't hurt to make it explicit). – Stephen Canon Jan 03 '11 at 23:11
  • I would note one benefit of `atan( )`, however: it's ~2x faster on a good math library, because it doesn't need to do the divide. I suspect that's why it exists. – Stephen Canon Jan 04 '11 at 01:51
  • 2
    Sometimes `atan` is not used for trigonometry but as a nice smooth (actually analytic) function with desirable monotonicity and boundary conditions. – R.. GitHub STOP HELPING ICE Jan 04 '11 at 03:06
2

Another answer, since these are not really related, rand:

  • it is of unspecified random quality
  • it is not re-entrant
Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
1

Some of this functions are modifying some global state. (In windows) this state is shared per single thread - you can get unexpected result. For example, the first call of rand in every thread will give the same result, and it requires some care to make it pseudorandom, but deterministic (for debug purposes).

crazylammer
  • 1,152
  • 8
  • 7
-2

basename() and dirname() aren't threadsafe.

animuson
  • 53,861
  • 28
  • 137
  • 147
arsenm
  • 2,903
  • 1
  • 23
  • 23
  • These are a functions of a single argument that modify their argument. This is like saying avoid += because it's not threadsafe. –  Jan 03 '11 at 23:57
  • 4
    No, they are not threadsafe. From the manpage: "The basename() function returns a pointer to internal static storage space that will be overwritten by subsequent calls. The function may modify the string pointed to by path." – arsenm Jan 04 '11 at 00:04
  • 4
    Whether or not they are threadsafe, `basename` and `dirname` are not part of the C standard library. – Stephen Canon Jan 04 '11 at 00:58