33

In a recent code review, it was claimed that

On select systems, calloc() can allocate more than SIZE_MAX total bytes whereas malloc() is limited.

My claim is that that's mistaken, because calloc() creates space for an array of objects - which, being an array, is itself an object. And no object can be larger in size than SIZE_MAX.

So which of us is correct? On a (possibly hypothetical) system with address space larger than the range of size_t, is calloc() allowed to succeed when called with arguments whose product is greater than SIZE_MAX?

To make it more concrete: will the following program ever exit with a non-zero status?

#include <stdint.h>
#include <stdlib.h>

int main()
{
     return calloc(SIZE_MAX, 2) != NULL;
}
Toby Speight
  • 27,591
  • 48
  • 66
  • 103
  • 2
    more quote : *"A good calloc(n, size) will detect products of n * size greater the SIZE_MAX"*. This actually looks like an opinion. Standard does not mention something like "good calloc" and says nothing about detection of "n * size greater the SIZE_MAX" situation – user7860670 Oct 08 '18 at 09:54
  • I would assume, that he means, that the argument passed to malloc contains the product from the size and the amount of objects created, which can be larger than `SIZE_MAX`, but in calloc you have two parameters for that (so you can allocate `SIZE_MAX` elements with 4 bytes each. – hellow Oct 08 '18 at 09:55
  • 1
    @hellow, exactly. I don't believe that's a valid call, because such an array violates the rule that `size_t` can represent the size of any object. – Toby Speight Oct 08 '18 at 09:58
  • I think this is very similar to [Can a standards-conforming string be longer than SIZE_MAX characters?](/q/41870273), but not quite a dupe. – Toby Speight Oct 08 '18 at 10:03
  • 3
    DR266 seems to be related. Only found [this](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1061.htm): `DR-266 RM position is sizeof never overflows. DG - ignore the calloc problem. PJ - size_t must be representable, cannot overflow, by definition. Attempt to overflow s/be a constraint violation / undefined behavior.` – KamilCuk Oct 08 '18 at 10:14
  • 2
    Here's the link to [DR-266](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_266.htm). – P.P Oct 08 '18 at 10:25
  • I found an [archived comp.lang.c thread](https://groups.google.com/forum/#!topic/comp.std.c/NewGPYEXSCE%5B1-25%5D), but it seems to have fizzled out inconclusively. – Toby Speight Oct 08 '18 at 10:27
  • It seems like it shouldn't even allocate more than PTRDIFF_MAX bytes: https://trust-in-soft.com/objects-larger-than-ptrdiff_max-bytes/ – Petr Skocik Oct 08 '18 at 10:32
  • @PSkocik, that lower limit would be a kindness to careless programmers, but wouldn't affect the cases we're taking about here, where pointers have a bigger range than `size_t`. – Toby Speight Oct 08 '18 at 10:35
  • The conclusion from DR-266 "The committee has deliberated and decided that more than one interpretation is reasonable. Translation limits do not apply to objects whose size is determined at runtime." implies that it's possible to have an allocated object larger than SIZE_MAX. `calloc` (c11 definition) doesn't forbid or say that the allocated size must be <= size_t. But you argued, if its size can't fit in `size_t`, that can't be indexed with `size_t`. So if an implementation supports larger-than-SIZE_MAX-calloc, then I guess implementation-defined behaviour [continued] – P.P Oct 08 '18 at 10:41
  • and one may have to use an implementation-defined type to access such objects - but this is neither supported nor forbidden by the standard. It's certainly a grey area (short of a bug in the standard) with no clear answer. – P.P Oct 08 '18 at 10:41
  • @P.P. that sounds like the correct answer (can be allowed as implementation-defined extension) - please transfer from comment to actual answer. I'll certainly upvote, and probably accept it. – Toby Speight Oct 08 '18 at 10:43
  • @P.P. But you miss the point that calloc can alloc more than SIZE_MAX bytes, but the object itself have a size of SIZE_MAX so it's iterable with size_t. Your DR doesn't cover this case. – Stargateur Oct 08 '18 at 10:44
  • @Stargateur I am not sure where I missed that - that's exactly what I noted in the comment. DR-266 doesn't directly cover that but provides relevant info. – P.P Oct 08 '18 at 10:53
  • @TobySpeight It doesn't really feel like an answer (that's why I posted it as a comment). I'll wait to see if someone comes up with better reasoning before posting it. – P.P Oct 08 '18 at 11:07
  • I don't see how the DR266 is relevant and I don't even understand it. Nowhere in the C standard does it say that `sizeof(a[SIZE_MAX/2][SIZE_MAX/2]);` exceeds an environmental limit. Rather, the only limit in the C standard is a minimum requirement 5.2.4.1 "65535 bytes in an object". Which the code in the DR does not necessarily exceed (it is at least 65534 bytes). Furthermore, the PTRDIFF_MAX versus SIZE_MAX problem is not documented or even recognized in the standard. Seems like we need a DR of the DR to me. – Lundin Oct 08 '18 at 11:40
  • @TobySpeight Good to see the healthy discourse on `calloc()`. I often wondered why `calloc(n,size)` wasn't `void * zalloc(size_t size);` to act like `malloc()` with zero pre-fill. To me, the point of the 2 vs. 1 parameters implied something more was potentially going on. – chux - Reinstate Monica Oct 08 '18 at 19:59
  • @TobySpeight [Rationale for International Standard— Programming Languages— C](http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf) does have "This also restricts the maximum number of elements that may be _declared_ in an array". So _declaring_ such an array (or supposedly a pointer so such an array) is problematic. This serves as a starting point to a counter my answer - even if I do not think is eventually discounts my answer. – chux - Reinstate Monica Oct 08 '18 at 20:04
  • @PSkocik: I used an OS awhile back where sizeof(size_t) was 2 but sizeof(ptrdiff_t) was 4. sbrk() really could be convinced to give me more than 65535 bytes at once. – Joshua Oct 09 '18 at 02:33

7 Answers7

20

Can calloc() allocate more than SIZE_MAX in total?

As the assertion "On select systems, calloc() can allocate more than SIZE_MAX total bytes whereas malloc() is limited." came from a comment I posted, I will explain my rationale.


size_t

size_t is some unsigned type of at least 16 bits.

size_t which is the unsigned integer type of the result of the sizeof operator; C11dr §7.19 2

"Its implementation-defined value shall be equal to or greater in magnitude ... than the corresponding value given below" ... limit of size_t SIZE_MAX ... 65535 §7.20.3 2

sizeof

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. §6.5.3.4 2

calloc

void *calloc(size_t nmemb, size_t size);

The calloc function allocates space for an array of nmemb objects, each of whose size is size. §7.22.3.2 2


Consider a situation where nmemb * size well exceeds SIZE_MAX.

size_t alot = SIZE_MAX/2;
double *p = calloc(alot, sizeof *p); // assume `double` is 8 bytes.

If calloc() truly allocated nmemb * size bytes and if p != NULL is true, what spec did this violate?

The size of each element, (each object) is representable.

// Nicely reports the size of a pointer and an element.
printf("sizeof p:%zu, sizeof *p:%zu\n", sizeof p, sizeof *p); 

Each element can be accessed.

// Nicely reports the value of an `element` and the address of the element
for (size_t i = 0; i<alot; i++) {
  printf("value a[%zu]:%g, address:%p\n", i, p[i], (void*) &p[i]); 
}

calloc() details

"space for an array of nmemb objects": This is certainly a key point of contention. Does the "allocates space for the array" require <= SIZE_MAX? I found nothing in the C spec to require this limit and so conclude:

calloc() may allocate more than SIZE_MAX in total.


It is certainly uncommon for calloc() with large arguments to return non-NULL - compliant or not. Usually such allocations exceed memory available, so the issue is moot. The only case I've encountered was with the Huge memory model where size_t was 16 bit and the object pointer was 32 bit.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • @chux do you have an example libc implementation where this would work? It would require storing the actual size in a type larger than `size_t`, which I very much doubt is the implementation in any `calloc`. I just checked a couple `libc` implementations and both of them put the product in a `size_t`; one checks for overflow and returns NULL, and the other just returns an array of the overflow-truncated size which you'll access out-of-bounds if you try and iterate it (invoking undefined behavior, of course), so it's certainly not safe to do. – Kevin Oct 08 '18 at 18:28
  • @Kevin As in the answer, such a `calloc()` existed in ye old days with a HUGE memory model. `calloc()` that simply multiplies `nmemb, size` to form the required size, without considering OF, is a weak implementation of calloc(). That libc weakness, especially with a ready fix you noted in the other, is not a prohibition of what the C spec can allow. OP's title question is not: can a non-NULL return with large operands, occur? of course it can - with weak lib code. The question is "Can `calloc()` allocate more than `SIZE_MAX` ...?" - the implication: can `calloc()` do so correctly? – chux - Reinstate Monica Oct 08 '18 at 19:25
18

SIZE_MAX doesn't necessary specify the maximum size of an object, but rather the maximum value of size_t, which is not necessarily the same thing. See Why is the maximum size of an array "too large"?,

But obviously, it isn't well-defined to pass a larger value than SIZE_MAX to a function expecting a size_t parameter. So in theory SIZE_MAX is the limit, and in in theory calloc would allow for SIZE_MAX * SIZE_MAX bytes to allocated.

The thing with malloc/calloc is that they allocate objects without a type. Objects with a type have restrictions, such as never being larger than a certain limit like SIZE_MAX. But the data pointed-at by the result from these functions does not have a type. It is not (yet) an array.

Formally, the data has no declared type, but as you store something inside the allocated data, it gets the effective type of the data access used for storage (C17 6.5 §6).

This in turn means that it would be possible for calloc to allocate more memory than any type in C can hold, because what's allocated does not (yet) have a type.

Therefore, as far as the C standard is concerned, it is perfectly fine for calloc(SIZE_MAX, 2) to return a value different from NULL. How to actually use that allocated memory in a sensible way, or which systems that even support such large chunks of memory on the heap, is another story.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • This does suggest, I think, a peculiar relationship between `SIZE_MAX` and `ptrdiff_t`, since on a system where `calloc` could behave as described, `ptrdiff_t` would have to be large enough to cope. – Steve Summit Oct 08 '18 at 11:31
  • 1
    @SteveSummit Yeah that's the catch, as explained by the accepted answer in the linked post, that `SIZE_MAX` and `PTRDIFF_MAX` always have to follow, and the latter being a signed type. However, given a `SIZE_MAX` 2^n, the standard doesn't restrict the compiler to have a `PTRDIFF_MAX` which is 2^(n+1). It's just very inconvenient for the compiler to have such a burdensome type system so in practice it isn't implemented like that. Overall, the C standard doesn't handle the problems with these two types very well, but leaves the thinking "to the implementation". – Lundin Oct 08 '18 at 11:47
  • DOS COMPACT memory model could actually do this if the standard library hadn't defined calloc() in such a way that it would have failed. – Joshua Oct 09 '18 at 02:35
  • An **object without a type** seems to be the key to this conundrum - that seems to be the best reasoning so far, and earns you the tick. – Toby Speight Oct 09 '18 at 07:51
  • @Lundin: Technically, `PTRDIFF_MAX` does not have to follow, and can be arbitrarily smaller than `SIZE_MAX` because the result of subtracting two pointers to the same array object is not required to always be representable as a `ptrdiff_t` value, and in this case the behavior is explicitly undefined. n1548 §6.5.6 ¶9. – Dietrich Epp Oct 09 '18 at 10:39
  • @DietrichEpp It's a weird section. But nobody including the compiler needs to consider the case of UB, so this is hardly proof that PTRDIFF_MAX can be smaller than SIZE_MAX, assuming a conforming implementation. – Lundin Oct 09 '18 at 14:52
  • @Lundin: This section seems to explicitly mention that `ptrdiff_t` might have limited range, do you have any basis for the claim that the range must be similar to the range of `size_t`? – Dietrich Epp Oct 09 '18 at 15:32
  • It's interesting. On a 32 bit machine, you could malloc 3GB, and have char* pointers 3 billion objects apart, so ptr1 - ptr2 is undefined behaviour. However, if you have two int* and int >= 2 byte, then ptr1 - ptr2 for int* pointers should always be correct. Tricky. – gnasher729 Nov 26 '18 at 18:18
2

From

7.22.3.2 The calloc function

Synopsis
1

 #include <stdlib.h>
 void *calloc(size_t nmemb, size_t size);`

Description
2 The calloc function allocates space for an array of nmemb objects, each of whose size is size. The space is initialized to all bits zero.

Returns
3 The calloc function returns either a null pointer or a pointer to the allocated space.

I fail to see why the space allocated should be limited to SIZE_MAX bytes.

Community
  • 1
  • 1
Swordfish
  • 12,971
  • 3
  • 21
  • 43
  • 6
    My reasoning is that `calloc()` *allocates space for an __array__ of objects*. An array is an object, therefore it must be measurable using a `size_t`. – Toby Speight Oct 08 '18 at 10:09
  • 1
    @TobySpeight "But the data pointed-at by the result from these functions does not have a type. It is not (yet) an array." in [this answer](https://stackoverflow.com/a/52701075/2410359) relates to the _An array is an object_ concern. – chux - Reinstate Monica Oct 08 '18 at 19:30
  • 1
    @chux: The behavior of `calloc()` was established at a time when it didn't matter whether the storage thereof held any particular kind of `object`, or a union of every kind of `object` that could possibly fit, or no object whatsoever. Since the language includes no means of measuring any object or objects that are created, however, there is no need to have a type capable of holding such measurement. – supercat Oct 09 '18 at 15:28
2

If a program exceeds implementation limits, behavior is undefined. This follows from the definition of an implementation limit as a restriction imposed upon programs by the implementation (3.13 in C11). The standard also says that strictly-conforming programs must adhere to implementation limits (4p5 in C11). But this also implies to programs in general because the standard does not say what happens when most implementation limits are exceeded (so it is the other kind of undefined behavior, where the standard does not specify what happens).

The standard also does not define what implementation limits may exist, so this a bit of carte blanche, but I think it is reasonable that the maximum object size is actually relevant to object allocations. (The maximum object size is typically smaller than SIZE_MAX, by the way, because the difference of pointers-to-char within the object must be representable in ptrdiff_t.)

This leads us to the following observation: A call to calloc (SIZE_MAX, 2) exceeds the maximum object size limit, so an implementation could return an arbitrary value while still conforming to the standard.

Some implementations will actually return a pointer which is not null for a call like calloc (SIZE_MAX / 2 + 2, 2) because the implementation does not check that the multiplication result does not fit into a size_t value. Whether this a good idea is a different matter, given that the implementation limit can be checked so easily in this case, and there is a perfectly fine way to report errors. Personally, I consider the lack of overflow checking in calloc an implementation bug, and have reported bugs to implementors when I saw them, but technically, it's merely a quality-of-implementation issue.

For variable-length arrays on the stack, the rule about exceeding implementation limits resulting in undefined behavior is more obvious:

size_t length = SIZE_MAX / 2 + 2;
short object[length];

There is really nothing an implementation can do here, so it has to be undefined.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
  • Can you back that up with references to the standard? – Werner Henze Oct 08 '18 at 10:51
  • And why do you bring in implementation limits? In J.3.12 in the C standard I do not see any implementation defined limits for calloc other than "Whether the calloc, malloc, and realloc functions return a null pointer or a pointer to an allocated object when the size requested is zero (7.22.3)." – Werner Henze Oct 08 '18 at 10:58
  • As noted in [DR-266](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_266.htm), translation limits do not apply to runtime/allocated objects. So not sure if translation limits apply to `calloc`. – P.P Oct 08 '18 at 11:00
  • @WernerHenze I added some references. Most of this is undefined because the standard does not say what happens, so it is difficult to back up things with references. – Florian Weimer Oct 08 '18 at 11:08
  • 1
    `SIZE_MAX` does not necessarily exceed the maximum object size. It is fine for an implementation to have a `PTRDIFF_MAX` that is 2^33 signed while at the same time it has a `SIZE_MAX` which is 2^32 unsigned. It's just very inconvenient for the compiler to have a type system like that, but the standard doesn't care [didn't even consider the problem]. – Lundin Oct 08 '18 at 11:21
  • C specifies `calloc ()` as allocating space for an arrays of objects. `calloc (SIZE_MAX, 2)` allocates `SIZE_MAX` objects. Each object's size is <= `SIZE_MAX`. Had the spec said it allocated an array object of `size` elements each of size `nmemb`, then "exceeds the maximum object size limit" concern would certainly apply. – chux - Reinstate Monica Oct 08 '18 at 14:07
  • 1
    @Lundin: Curiously, C11 has imposed a rule that `ptrdiff_t` must be at least 17 bits, even on a freestanding implementation with less than 32K total of storage (the size of object *hosted* implementations are required to support was increased to 65,535 bytes, but I'm not sure I see the point--regardless of what the Standard says, implementations that can practically support objects that size will do so, and those that can't, won't). In any case, I'm really unsure what purpose a 17-bit `ptrdiff_t` would serve on an implementation with less than 32K of total storage. – supercat Oct 09 '18 at 15:22
2

Per the text of the standard, maybe, because the standard is (some would say intentionally) vague about this sort of thing.

Per 6.5.3.4 ¶2:

The sizeof operator yields the size (in bytes) of its operand

and per 7.19 ¶2:

size_t

which is the unsigned integer type of the result of the sizeof operator;

The former cannot be satisfied in general if the implementation admits any type (including array types) whose size is not representable in size_t. Note that, regardless of whether you interpret the text about the pointer returned by calloc pointing to "an array", there is always an array involved with any object: the overlaid array of type unsigned char[sizeof object] which is its representation.

At best, an implementation that allows the creation of any object larger than SIZE_MAX (or PTRDIFF_MAX, for other reasons) has fatally bad QoI (quality of implementation) problems. The claim on code review that you should account for such bad implementations is bogus unless you are specifically trying to ensure compatibility with a particular broken C implementation (sometimes relevant for embedded, etc.).

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
1

Just an addition: With a tiny bit of maths you can show that SIZE_MAX * SIZE_MAX = 1 (when evaluated according to C rules).

However, calloc (SIZE_MAX, SIZE_MAX) is only allowed to do one of two things: Return a pointer to an array of SIZE_MAX elements of SIZE_MAX bytes, OR return NULL. It is NOT allowed to calculate the total size by just multiplying the arguments, getting a result of 1, and allocating one byte, cleared to 0.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
0

The Standard says nothing about whether it might be possible for a pointer to somehow be created such that ptr+number1+number2 could be a valid pointer, but number1+number2 would exceed SIZE_MAX. It certainly allows for the possibility of number1+number2 exceeding PTRDIFF_MAX (though for some reason C11 has decided to require that even implementations with a 16-bit address space must use a 32-bit ptrdiff_t).

The Standard does not mandate that implementations provide any means of creating pointers to such large objects. It does, however, define a function, calloc(), whose description suggests that it could be asked to attempt to create such an object, and would suggest that calloc() should return a null pointer if it can't create the object.

The ability to allocate any kind of object usefully, however, is a Quality of Implementation issue. The Standard would never require that any particular allocation request succeed, nor would it forbid an implementation from returning a pointer that might turn out to be unusable (in some Linux environments, a malloc() might yield a pointer to an over-committed region of address space; an attempt to use the pointer when insufficient physical storage is available could cause a fatal trap). It would certainly be better for a non-capricious implementation of calloc(x,y) to return null if the numerical product of x and y exceeds SIZE_MAX than for it to yield a pointer which can't be used to access that number of bytes. The Standard is silent, however, whether returning a pointer that can be used to access y objects of x bytes each should be considered be better or worse than returning null. Each behavior would be advantageous in some situations, and disadvantageous in others.

supercat
  • 77,689
  • 9
  • 166
  • 211