15

I'm trying to convince (citing the specific parts of the C99 standard) a colleague that the following is undefined behavior:

int *p = malloc(1);
p[0] = 0;

But I cannot find the specific parts in the standard which clearly ensure that this is undefined. I'm looking specifically for the logical steps in the standard which lead from these lines to the conclusion: undefined behavior. Is it the conversion from void * to int * in the first line? The assignment in the second line?

The only relevant part I can find about malloc is that it returns a suitably aligned pointer (7.20.3):

The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (...)

I tried grepping for space in the norm, but there's too much noise due to white space and other lexical issues.

anol
  • 8,264
  • 3
  • 34
  • 78
  • 9
    you are allocating 1 byte of memory and then write an `int` (4 bytes ?). That should be enough to make it undefined behaviour. – Manos Nikolaidis Nov 18 '15 at 09:27
  • 3
    I edited the question to clarify that I'm specifically looking for the parts in the standard which lead to this conclusion, because although I know it is UB, I cannot find the appropriate justification in the standard. – anol Nov 18 '15 at 09:31
  • 1
    Another case of UB occurs if you get a _null pointer_ from `malloc`. You should always test the result of functions which can encounter an error. – too honest for this site Nov 18 '15 at 09:43
  • 2
    @Magisch "p[0] is equivalent to p" – no it isn't. "and pointers are safely automatically used correctly in c" – no they aren't, nothing could be further from the truth. – The Paramagnetic Croissant Nov 18 '15 at 09:57
  • 1
    @TheParamagneticCroissant Functionally, they are. And in C, you never need to cast malloc pointers. They are safely and automatically promoted on use. – Magisch Nov 18 '15 at 10:02
  • 2
    @Magisch "Functionally, they are" – **no, they aren't.** If `p` is a pointer, then `p[0]` is the same as `*p`. You can't possibly be asserting that the pointer is always the same thing as the object it points to? Also, I didn't say that you need to cast `void *` because you don't. It's just that "pointers are safely automatically used correctly in c" **doesn't mean that,** because pointers are **unsafe** in C. C is not a managed language – and "safe pointers" have **nothing to do** with such implicit type conversions. – The Paramagnetic Croissant Nov 18 '15 at 10:06
  • @TheParamagneticCroissant Pardon my ignorance here, but in all my use cases, in a case where p is anything p[0] means literally p, because the [0] after that is the increment. 0 as increment means at the same starting position as p itself, thus the same thing. Is there an implicit dereference here or what? And by "safe" I take your point. I meant safe as in you can use it as intended without messing with it (explicit cast) beforehand. – Magisch Nov 18 '15 at 10:10
  • 2
    @Magisch `p[i]` means `*(p + i)`. If you want `p + i`, that's spelled `&p[i]`. – The Paramagnetic Croissant Nov 18 '15 at 10:11
  • 1
    @TheParamagneticCroissant so p[0] means *(p + 0) or in other words, *p. Alright. Then whats does simply p mean? *p again? Then whats incorrect at my original statement. – Magisch Nov 18 '15 at 10:16
  • 1
    @Magisch "Then whats does simply p mean? *p again?" – no, it's just `p`. If you don't use the `[]` operator, you don't dereference. – The Paramagnetic Croissant Nov 18 '15 at 10:36
  • 2
    @Magisch `p` is an address. `p[0]` is the data at that address. Two most probably very different things, especially since `int`s and addresses aren't even necessarily the same size. – 8bittree Nov 18 '15 at 19:55

6 Answers6

18

Adding from 7.20.3.3 The malloc function to your quote:

The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.
The malloc function returns either a null pointer or a pointer to the allocated space.

So there are 2 possible sources of undefined behavior, one is overwriting (Size of int is guaranteed to be 16 bits or more, but you are allocating just 1 byte which is 8 bit on almost all systems) the buffer, and second is possible de-referencing of null-pointer.

From 6.5.2.1 Array subscripting, p[0] = 0 is equivalent to *p = 0. Type of *p is an int so it would fill sizeof(*p) * CHAR_BIT bits with 0 which may not all belong to the allocated buffer causing the UB.

There is no undefined behavior in first line of code (assignation), UB if any would be in second line (de-referencing).

But on machines where CHAR_BIT is large and sizeof(int) is 1, this will be well defined behavior for the cases when malloc doesn't return a null pointer.

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
  • I think the OP is looking for something that explicitly states that writing an int to this buffer counts as overwriting its size (and is then UB). – Oliver Charlesworth Nov 18 '15 at 09:42
  • 6
    The only guarantee on the size of `int` is that it's at least 16 bits, right? Is it possible for `CHAR_BIT` to be 16, and `sizeof(int) == sizeof(char)`? The code would be legal on that strange C implementation. re: @Oliver's point: It's in the standard somewhere that writing outside of objects is UB. I don't have the standard on speed-dial to quote snippets of it, myself, though. – Peter Cordes Nov 18 '15 at 09:43
  • I think the exact line you are looking for is the in paragraph 1 of the same chapter: *"... The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object **or an array of such objects in the space allocated** ..."* – user694733 Nov 18 '15 at 09:47
  • 6
    @PeterCordes Yes you are right. There were machines (Crays) where CHAR_BIT was 32. That means sizeof(int) = sizeof(char) = 1. That means `malloc(1)` is equivalent to `malloc(sizeof(int))` But on ILP32 and LP64 system this is clearly UB. – Mohit Jain Nov 18 '15 at 09:47
  • 2
    @Peter Cordes Actually there isn't anything like "An int has to have 16 bits in the standard". The size requirements for all types but char are only given in multiple of the size of char. The standard guarantees that sizeof(int) >= 2. – Vincent Nov 18 '15 at 09:53
  • 2
    @Vincent As per `C` specs , an `int` must support atleast `[−32767, +32767]` range which maps to minimum 16 bits requirement. And `sizeof(int)` is `NUMBER_OF_BITS / CHAR_BIT` – Mohit Jain Nov 18 '15 at 09:55
  • 10
    @Vincent: Please post the paragraph where the standard requires `sizeof(int) >= 2`. The only requirement is `sizeof(char) == 1`. And PeterCordes did not state "an `int` has to have 16 bits". He just stated it has **at least** 16 bits, which follows from the minimum required range for `int` (+/-32767). The result of `sizeof(int)` follwos from this and `CHAR_BIT`. (He confused size and width, though) – too honest for this site Nov 18 '15 at 09:56
  • 1
    I think the fact that `malloc()` can return `NULL` is *technically* sufficient. Even replacing `int *p = malloc(1);` by `int *p = malloc(sizeof int);` this would *technically* by UB, as `malloc()` can return null, which makes the behaviour undefined in some cases, which means it is not defined in all cases, which is (I think) sufficient to call it UB. – abligh Nov 18 '15 at 10:40
  • @abligh Thanks. You are right. I added further details to cover up a case if question is later changed to `if(p) p[0] = 0;` or something similar. – Mohit Jain Nov 18 '15 at 11:36
  • 1
    Incidentally in lots of modern machines, malloc never returns NULL. If you run out of memory, you crash accessing the newly allocated memory. Depending on what you are doing, this might not be the best idea, but it really is the best behavior for desktops. – Joshua Nov 18 '15 at 16:51
  • 1
    There are modern systems with `CHAR_BIT` of 16 or 32: DSPs! – Deduplicator Nov 18 '15 at 20:18
  • Don't you also have to find where the standard says you can't access an object that doesn't have sufficient space allocated to hold it (or something like that)? – user253751 Nov 18 '15 at 20:58
  • 1
    "`p[0] = 0` is equivalent to `*p = 0`" - actually it is equivalent to `*(p+0) = 0`, and it's not completely clear that `p+0` is well-defined – M.M Nov 18 '15 at 20:59
  • @Rhymoid It would fill those bits with `0` because RHS in the assignation is `0`. – Mohit Jain Nov 19 '15 at 04:40
  • Thanks @M.M. Agree with you, I did not think that way. No where in the spec it is mentioned that E[0] can be directly transformed to *E – Mohit Jain Nov 19 '15 at 05:00
  • 1
    @MohitJain You're right. For some reason, I thought `p[0]` was a `int *` rather than an `int`. –  Nov 19 '15 at 09:21
  • @MohitJain "So there are 2 possible sources of undefined behavior,"- the third one you didn't mention is reading values from the memory whose value is indeterminate – Giorgi Moniava Jan 12 '16 at 22:14
  • @Giorgi There is no read in either of the lines in OP's code. – Mohit Jain Jan 13 '16 at 05:32
  • @MohitJain Not in OPs code but I said in general, the way you mentioned "So there are 2 possible sources of undefined behavior," – Giorgi Moniava Jan 13 '16 at 11:03
  • I was talking about OP's code with above reference in mind. – Mohit Jain Jan 13 '16 at 17:42
7
int *p = malloc(1);
p[0] = 0;

This is undefined behaviour because you have allocated 1 byte and in above assignment you are trying to write four bytes (assuming int is four bytes). This holds true as long as sizeof(int) > 1.

Giorgi Moniava
  • 27,046
  • 9
  • 53
  • 90
5

6.5.3.2 Address and indirection operators

...

Semantics

The unary & operator yields the address of its operand. If the operand has type ‘‘type’’, the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.

The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

The [] operator is an implied * operator on the pointer. The value assigned to the pointer is invalid for an int as long as sizeof( int ) > 1.

The behavior is undefined.

And NULL is an invalid pointer, so this also covers malloc() returning NULL.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • "invalid value" doesn't seem to be defined by the standard anywhere, all I can see is non-exhaustive lists of examples (mostly in non-normative situations). For example [this thread](http://stackoverflow.com/questions/25390577/is-memcpya-1-b-1-0-defined-in-c11) asks about its meaning indirectly. Also, "invalid value for ____" doesn't seem to be used anywhere; a value is either invalid or it isn't, and certainly `malloc(1)` (if not returning null) returns a valid value. – M.M Nov 18 '15 at 20:46
  • The first paragraph you quote doesn't apply to this code; it describes the use of the `&` operator which does not occur in this code. In your bolded text "the operand" means "the operand of `&`" . – M.M Nov 18 '15 at 20:57
  • @M.M - I included the first paragraph for the references to the `[]` operator being an implied unary `*` operator, since that's what the second paragraph uses to specify UB. (cont) – Andrew Henle Nov 18 '15 at 21:51
  • The standard states: *Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.* and, in **7.1.4**: *If an argument to a function has an invalid value (such as a value outside the domain of the function, **or a pointer outside the address space of the program**...* Dereferencing a pointer to one byte as something larger could very well be effectively "a pointer outside the address space of the program" and hence invalid. – Andrew Henle Nov 18 '15 at 21:52
  • The relation between `*` and `[]` is explicitly described in 6.5.2.1 (Array subscripting), so that section could be quoted as direct evidence – M.M Nov 19 '15 at 05:09
  • But 6.5.2.1 doesn't explicitly label the `[]` operator as an implied unary `*` as that first paragraph of 6.5.3.2 does, and it's the second paragraph of 6.5.3.2 that makes using the `*` operator on an "invalid" pointer undefined behavior. – Andrew Henle Nov 19 '15 at 09:37
5

Quotes from the standard:

J.2, Undefined behavior: The behavior is undefined in the following circumstances: ... An array subscript is out of range, even if an object is apparently accessible with the given subscript

6.2.5, Types, 20: An array type describes a contiguously allocated nonempty set of objects.

As long as sizeof(int) > 1, your malloc(1) did not allocate a nonempty set of objects, so the array size as allocated is zero and with p[0] you access with a subscript that is out of range. QED.

Paul Ogilvie
  • 25,048
  • 4
  • 23
  • 41
2

The code *p is covered by (at least - other sections may also cover it) 6.3.2.1/1:

An lvalue is an expression (with an object type other than void) that potentially designates an object; if an lvalue does not designate an object when it is evaluated, the behavior is undefined.

The definition of "object" is:

region of data storage in the execution environment, the contents of which can represent values

The lvalue *p designates sizeof(int) bytes of space, however there is only 1 byte of storage which can represent values (in other words, unallocated space cannot form part of an object). So, if sizeof(int) > 1, then *p does not designate an object.


For the actual code in the question p[0]: this is equivalent to *(p+0) . It's unclear to me from 6.5.6/8 whether p + 0 causes UB or not. But this is moot because even if it doesn't cause UB, deferencing the result does as shown above; so p[0] causes UB either way.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • Can not find any reference, but looking at the definition of addition and subtraction, can't compiler very safely remove addition or subtraction with constant zero. (Having read 6.5.6.8 many times I am not sure in my argument) – Mohit Jain Nov 19 '15 at 05:15
  • @MohitJain well, it says " If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element [...]" However, `p` does not point to an element of an array object in this case. Well, not an element of an array of ints anyway! I think the wording is not very precise. By this wording, `int x[5]; int *p = x + 5 - 5;` would be undefined and I don't think that is intended. On the other hand, it seems commonly accepted that adding `0` to a null pointer is undefined. – M.M Nov 19 '15 at 05:20
  • Agreed. Just one trivial correction, `int *p = x + 5 - 5;` is well defined, `int *p = x + 6 - 6;` is not. – Mohit Jain Nov 19 '15 at 05:44
  • @MohitJain no, I meant to say `x + 5 - 5`. `x + 5` does not point to an element of an array object, so if we take the above quote literally then it can't have anything subtracted from it except `- 1` which is explicitly mentioned later in the same section – M.M Nov 19 '15 at 05:47
  • Doesn't validity of `x + k` imply that `x + k - k` is valid? In the referred section this `k` is `1`. Otherwise `x + 5` should also be UB. – Mohit Jain Nov 19 '15 at 06:05
  • no, because `x` points to an element of an array object, but `x + 5` doesn't – M.M Nov 19 '15 at 07:02
0
malloc(1)

returns a adress to a 1 byte big buffer.

An int is bigger then 1 byte, generally speaking.

Thus, assigning an int value to an 1 byte big buffer is UB.

Pointers returned by malloc do not need to be casted in c, as they are safely and automatically promoted to the correct pointer type on use.

Magisch
  • 7,312
  • 9
  • 36
  • 52
  • Generally speaking, unless you also consider DSPs and such. Which are more common than desktops anyway. – Deduplicator Nov 18 '15 at 20:21
  • 1
    I think technically `malloc(1)` returns a pointer to a buffer big enough to hold *at least 1* character. All the malloc() implementations I've seen coerce the size to be an integral of the native word size. Which doesn't change the fact that the behavior is undefined, but does explain why the program doesn't immediately fall over. – TMN Nov 18 '15 at 20:37
  • @TMN That may be true, but its not defined to be necessary in the C standard. So at best its implementation-defined. – Magisch Nov 19 '15 at 16:55