calloc(): Do the individual values matter for performance?

Question

I'm currently writing an embedded application in C where performance is critical.

Currently, I'm allocating lots of empty memory like this: calloc(1, num_bytes) - however, I simply calculate num_bytes as the product of a number of items and the size of each item earlier in the code as it's code that used to call malloc.

calloc seems unique in that it is the only memory allocation function of the -alloc family which takes two arguments for the size. Is there a good reason for doing this? Are there performance implications for specifying different arguments? What was the rationale in choosing this argument layout?

There probably aren't any noticeable differences, especially not if you enable more advanced optimizations in your compiler. — obataku, Aug 27 '12 at 20:41
duplicate question: http://stackoverflow.com/questions/7581192/how-did-malloc-and-calloc-end-up-with-different-signatures http://stackoverflow.com/questions/7536413/why-calloc-takes-two-arguments-while-malloc-only-one — Jim Balter, Aug 27 '12 at 21:07
@Jim - Not exactly duplicates, those are "Why?" questions, where as mine is "Does it matter?" — Thomas O, Aug 27 '12 at 22:02
You also asked about the rationale. The answer to your question (no, it does not matter) can be found there by careful reading of the rationale. — Jim Balter, Aug 28 '12 at 00:02

score 3 · Answer 1 · edited May 23 '17 at 12:04

3

One advantage of having the separate arguments is that it automatically guards against integer overflow:

// On a 32-bit system, the calloc will almost certainly fail, but the malloc
// will succeed to overflow, likely leading to crashes and/or security holes
// (e.g. if the number of items to allocate came from an untrusted source)
void *a = calloc(64, 67108865);  // 2^32/64 + 1
void *b = malloc(64 * 67108865);  // will allocate 64 bytes on 32-bit systems

For large allocations, there can also be a performance advantage of doing a calloc instead of a malloc and memset combination, since the calloc implementation can use its internal knowledge of the heap to avoid unnecessary work or have improved cache performance.

For example, if the allocator decides to use an OS function such as mmap(2) or VirtualAlloc to acquire more virtual address space, that memory will come pre-zeroed for security reasons. See this question for a detailed explanation. For small allocations, you're unlikely to notice much of a difference.

Some calloc implementations just call malloc and memset internally, so there's no advantage other than a potential overflow check.

edited May 23 '17 at 12:04

Community

1
1

answered Aug 27 '12 at 20:48

Adam Rosenfield

390,455
97
512
589

Interesting; the Visual Studio 10.0 runtime does do an overflow check. – Adam Rosenfield Aug 27 '12 at 20:58
@Dietrich: I don't know what version of glibc you're looking at, but in glibc-2.12.2's `malloc/malloc.c`, the implementation of `calloc` in `public_cALLOc` does not look like what you wrote, and it does do an overflow check. – Adam Rosenfield Aug 27 '12 at 21:04
Indeed so. I was looking at `cALLOc` not `public_cALLOc`. – Dietrich Epp Aug 27 '12 at 21:11
It's not that unlikely ... see http://stackoverflow.com/questions/2688466/why-mallocmemset-slower-than-calloc – Jim Balter Aug 28 '12 at 00:11
@Jim: Thanks, that briefly crossed my mind but I didn't think deeply about it. See updated answer. – Adam Rosenfield Aug 28 '12 at 14:50

score 1 · Answer 2 · answered Aug 27 '12 at 20:48

1

I suppose that the argument layout of calloc() is to allow the allocation of object sizes greater than the storage capacity of a single size_t parameter type (which might be as small as 64KiB).

Whether performance is affected depends mostly on how the arguments are passed to calloc() in your particular environment. Usually, more arguments to pass means more data to be transferred between the caller and the callee -- for example, more arguments will need to be pushed to the callee's stack, yielding a couple of extra instructions to push the arguments in. But I believe that this extra overhead won't be a bottleneck in your program, specially when compared to the execution time of the memory allocator itself.

If you're worried about the performance of calloc(), malloc() might be faster simply due to the fact that it does not initialize the allocated buffer as calloc() do.

answered Aug 27 '12 at 20:48

alecov

4,882
2
29
55

Actually, `calloc` does not always initialize the buffer, it often merely returns memory that is either already zero, or will become zero when it is read. – Dietrich Epp Aug 27 '12 at 20:57
@Dietrich Epp: True. As always, the particular implementation of an interface might be absolutely different from what we suppose it should be, while still conforming to the documented behavior. Thanks for pointing that out. – alecov Aug 27 '12 at 21:02
The overhead of passing two arguments rather than one is trivial and surely not what the OP was asking about -- the question was about whether the *values* of the arguments to calloc matter. OTOH, calloc *can* be far faster than malloc + memset, for the reason Dietrich noted. – Jim Balter Aug 28 '12 at 00:09
This is impossible. `size_t` must be able to store the size of any object, and if `calloc` could allocate objects larger than that, the implementation would be failing to meet the requirements on `size_t`. – R.. GitHub STOP HELPING ICE Aug 28 '12 at 13:07
@R..: `size_t` must be able to store the size of any ONE object. `calloc` allocates arrays of objects. – Zan Lynx Aug 28 '12 at 14:54
@Jim Balter: I don't see how it is _surely_ not what the OP was asking about. Actually, I believe this is _precisely_ one of the things he wanted to discuss. Nonetheless, the answer I provided tries to address both of the issues, and it has been clearly stated that the argument passing overhead is the least important of the concerns. – alecov Aug 28 '12 at 17:41
@R..: As far as I know, your statement is false. `size_t` is the result of a `sizeof` expression. Since `calloc()` returns a pointer, the restrictions on `size_t` do not apply to the pointed-to buffer (since the operator cannot be applied directly to it). I haven't found anything in ISO 9899:1999 (or 1990) saying that the _only_ object sizes allowed by an implementation are restricted to `size_t`. Please direct me if you had. – alecov Aug 28 '12 at 17:49
@ZanLynx: An array of objects is an object in itself. This is clear per the C standard. I agree however that it's unclear that objects larger than `size_t` are disallowed. – R.. GitHub STOP HELPING ICE Aug 28 '12 at 18:16
*Surely* the OP is not concerned about the small number of nanoseconds taken by passing two arguments instead of one, which the OP is *surely* already aware of. The OP stated using `1 * num_bytes` instead of the actual size and number of objects, and wants to know if there's any performance difference between the two. – Jim Balter Aug 29 '12 at 19:07

score 0 · Answer 3 · answered Aug 27 '12 at 20:48

I'm currently writing an embedded application in C where performance is critical.

I think that calloc optimization should be pretty low as priority. But try to see whether it's possible to employ malloc instead (avoiding the zero-initialization), avoid alloc altogether by re-using memory, and possibly allocating memory padded to a platform-specific boundary.

These are all very minor optimization, though (except maybe for alloc reuse). I'd focus on the algorithm instead.

calloc(): Do the individual values matter for performance?

3 Answers3