Why does MISRA C state that a copy of pointers can cause a memory exception?

Question

MISRA C 2012 directive 4.12 is "Dynamic memory allocation should not be used".

As an example, the document provides this sample of code:

char *p = (char *) malloc(10);
char *q;

free(p);
q = p; /* Undefined behaviour - value of p is indeterminate */

And the document states that:

Although the value stored in the pointer is unchanged following the call to free, it is possible, on some targets, that the memory to which it points no longer exists and the act of copying that pointer could cause a memory exception.

I'm ok with almost all the sentence but the end. As p and q are both allocated on the stack, how can the copy of the pointers cause a memory exception ?

The pointer `p` is a local variable on the stack, but it points to the heap. And if you dereference `q` after your code snippet, you have *undefined behavior*. — Basile Starynkevitch, Nov 02 '14 at 21:11
@BasileStarynkevitch: Possibly already before that, as see the answer by 2501. — Deduplicator, Nov 02 '14 at 21:18
A typical example of over-reaction. Since you can mis-use dynamic allocation, it "should not be used". Guess what? Following that logic, you probably should restrict yourself to `unsigned int` when writing C code. And even `unsigned` can be mis-used. — MSalters, Nov 02 '14 at 22:30
You can't restrict yourself to unsigned int because it is not recommended by MISRA ;) Jokes aside, dynamic allocation can be used in legitimate case with MISRA, it's juste not recommended. The document also mentions the problem encountered when insuficient memory is available and the problem of the time that may be needed to perform allocation/deallocation in some cases. — toto, Nov 02 '14 at 23:45
BTW in 16-bit protected mode on x86 the act of loading an invalid pointer (more precisely an invalid selector) can cause a processor exception, so this isn't purely a theoretical issue. See the MOV instruction in Volume 2 of [Intel® 64 and IA-32 Architectures Software Developer Manuals](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html). — user786653, Nov 03 '14 at 17:07
@MSalters Note that MISRA is not your run of the mill coding standard. It's for embedded systems in contexts like aerospace and medical devices. The reasoning is not "it can be misused", the reasoning is "it's rarely needed for our applications, and not using it prevents a class of run-time error (out of memory) which is hard to handle robustly, and robustness is critical in our applications". And, of course, "should" is not "shall" as toto explained. — , Nov 03 '14 at 17:10
@delnan: I'm entire familiar with its background, having worked on Automotive Embedded Software myself. Not just fancy stuff, but even bootloaders. Even there dynamic memory made sense - I just needed to get a file in memory, and I wouldn't know up front exactly how big it was. "Out of memory" was a problem for the people creating that file, not me ;) — MSalters, Nov 03 '14 at 19:06
@toto:any idea where one can access that MISRA 2012 document? — Giorgi Moniava, Dec 18 '14 at 12:41
@giorgim: I bought it on misra website for about 20 euros. A pdf document is generated with your name on every page and I seem to recall that you need to pay with a credit card number. That's certainly why it's not easy to find it elsewhere. — toto, Dec 18 '14 at 20:08

score 44 · Accepted Answer · edited Jun 20 '20 at 09:12

44

According to the Standard, copying the pointer q = p;, is undefined behaviour.

Reading J.2 Undefined behaviour states:

The value of a pointer to an object whose lifetime has ended is used (6.2.4).

Going to that chapter we see that:

6.2.4 Storage durations of objects

The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address,33)and retains its last-stored value throughout its lifetime.34)If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

What is indeterminate:

3.19.2 indeterminate value: either an unspecified value or a trap representation

edited Jun 20 '20 at 09:12

Community

1
1

answered Nov 02 '14 at 21:08

2501

25,460
4
47
87

5

+1 And some architectures actually say that all pointers not pointing into valid memory (or just past?) are trap-representations. – Deduplicator Nov 02 '14 at 21:10
8

http://www.ibm.com/developerworks/library/pa-ctypes3/ has a really good explanation about the background behind trap representations. – Blagovest Buyukliev Nov 02 '14 at 21:20
1

Thank you all for you responses and link. – toto Nov 02 '14 at 21:26
3

As an example of *why* it matters that it's UB, even on implementations where there are no trap representations, consider what happens if you replace the last line by `q = malloc(10); if (p==q) ...` – R.. GitHub STOP HELPING ICE Nov 03 '14 at 01:48

score 14 · Answer 2 · edited May 23 '17 at 11:53

14

Once you free an object through the pointer, all pointers to that memory become indeterminate. (Even) reading indeterminate memory is undefined behaviour (UB). Following is UB:

char *p = malloc(5);
free(p);
if(p == NULL) // UB: even just reading value of p as here, is UB
{

}

edited May 23 '17 at 11:53

Community

1
1

answered Nov 02 '14 at 21:39

Giorgi Moniava

27,046
9
53
90

1

Ah here we go somebody gets it. (Please note this only is true because the compiler is allowed to assume standard library functions.) – Joshua Nov 03 '14 at 00:35
@Joshua: sorry, didn't get you exactly – Giorgi Moniava Nov 03 '14 at 06:22
1

@pseudonym27 If you used `malloc` from the standard library but you were overriding `free` with something else, the code would not have undefined behaviour. But since the compiler can assume that `free` is indeed the standard library function, it can perform optimizations, which would lead to the code being undefined. – kasperd Nov 03 '14 at 11:31
You need to provide a convincing explanation on how calling `free` could possibly affect the value of `p`. – barak manos Nov 05 '14 at 07:41
@barakmanos: https://www.securecoding.cert.org/confluence/display/seccode/MEM30-C.+Do+not+access+freed+memory? – Giorgi Moniava Nov 05 '14 at 12:37
1

@barakmanos - because that is what the C Standard specifies. The pointer is indeterminate after `free()` – Andrew Nov 06 '14 at 11:07
1

@Andrew: That's not a practical answer with logical reasoning. It sounds more like a theological answer (something like "because god says so"). – barak manos Nov 06 '14 at 11:24
@barakmanos - you are correct, but it is also the truth. It is one of the unfortunate side-effects of trying to standardise a language after it is in widespread use. WG14 (the ISO committee) did not want to break existing functionality so took the widest case... Many compilers will define the pointer to NULL after free() but many will not... it is indeterminate. – Andrew Nov 06 '14 at 11:26
1

@Andrew: People are killing each other because they claim that it is written somewhere that they should do so (a.k.a. "specified by the standard"). Personally I doubt that there's a good enough reason for them to do so, but even if there is, it sure as hell not because of what their "standard" specifies. – barak manos Nov 06 '14 at 11:30
I've added a fuller response (about the history) as an Answer... maybe we can move down there? – Andrew Nov 06 '14 at 11:37

score 4 · Answer 3 · answered Nov 06 '14 at 11:36

First, some history...

When ISO/IEC JTC1/SC22/WG14 first started to formalise the C Language (to produce what is now ISO/IEC 9899:2011) they had a problem.

Many compiler vendors had interpreted things in different ways.

Early on, they made a decision to not break any existing functionality... so where compiler implementations were divergent, the Standard offers unspecified and undefined behaviours.

MISRA C attempts to trap the pit-falls that these behaviours will trigger. So much for the theory...

--

Now to the specific of this question:

Given that the point of free() is to release the dynamic memory back to the heap, there were three possible implementations, all of which were "in the wild":

reset the pointer to NULL
leave the pointer as was
destroy the pointer

The Standard could not mandate any one of these, so formally leaves the behaviour as undefined - your implementation may follow one path, but a different compiler could do something else... you cannot assume, and it is dangerous to rely on a method.

Personally, I'd rather the Standard was specific, and required free() to set the pointer to NULL, but that's just my opinion.

--

So the TL;DR; answer is, unfortunately: because it is!

Eh? Since the standard declaration of free() is `void free(void *ptr);` the compiler can't do anything with the pointer itself, just the contents. The compiler cannot set it to NULL or "destroy it" (how do you destroy a pointer?), or do anything else in a fancy, implementation-defined way, since _the free function only has access a local copy of the pointer_. It can't affect the caller's version of the pointer no matter how hard it tries. You'd have to change the C standard to `free (void**)` which ain't gonna happen. So the C standard does indirectly mandate 2) above. — Lundin, Nov 10 '14 at 13:55
Changing the C standard ain't going to happen, no... the undefined behaviour will remain undefined! — Andrew, Nov 10 '14 at 13:59
that is, `free` couldn't be a function in C if it were to consistently NULL a pointer. It needed to be an operator, like `delete` in C++. — Antti Haapala -- Слава Україні, Mar 13 '16 at 12:14

score 3 · Answer 4 · answered Nov 02 '14 at 21:21

3

While both p and q are both pointer variables on the stack, the memory address returned by malloc() is not on the stack.

Once a memory area that was successfully malloced is freed then at that point there is no telling who may be using the memory area or the disposition of the memory area.

So once free() is used to free an area of memory previously obtained using malloc() an attempt to use the memory area is an undefined type of action. You might get lucky and it will work. You might be unlucky and it will not. Once you free() a memory area, you no longer own it, something else does.

The issue here would appear to be what machine code is involved in copying a value from one memory location to another. Remember that MISRA targets embedded software development so the question is always what kind of funky processors are out there that do something special with a copy.

The MISRA standards are all about robustness, reliability, and eliminating risk of software failure. They are quite picky.

answered Nov 02 '14 at 21:21

Richard Chambers

16,643
4
81
106

6

The question was not about the memory allocated but by the pointers themselves. – toto Nov 02 '14 at 21:25
1

@toto, yes I realize that it was about the pointers themselves. memory allocation was a lead in since the pointers point to a malloced area. Please take a look at fourth para. – Richard Chambers Nov 02 '14 at 21:28
1

Yes thank you for your response, I tought you misunderstood my question because of your first three paragraphs. – toto Nov 02 '14 at 21:52
The 'undefined' is more due to advanced processors than to simple embedded ones. – H H Nov 03 '14 at 16:00
You presuppose that the local variables are on the stack... that is not necessarily the case. But either way, it is not relevant! – Andrew Nov 06 '14 at 11:09
@Andrew, since the code snip shows a variable definition with assignment using `malloc()` and since the question actually contains the phrase "As p and q are both allocated on the stack", the presupposition does seem to be the case. – Richard Chambers Nov 06 '14 at 13:55

score 3 · Answer 5 · answered Nov 14 '16 at 08:12

The value of p cannot be used as such after the memory it points to has been freed. More generally, the value of an uninitialized pointer has the same status: even just reading it for the purpose of copying to invokes undefined behavior.

The reason for this surprising restriction is the possibility of trap representations. Freeing the memory pointed to by p can make its value become a trap representation.

I remember one such target, back in the early 1990s that behaved this way. Not en embedded target then and rather in widespread use then: Windows 2.x. It used the Intel architecture in 16-bit protected mode, where pointers were 32-bit wide, with a 16-bit selector and a 16-bit offset. In order to access the memory, pointers were loaded in a pair of registers (a segment register and an address register) with a specific instruction:

    LES  BX,[BP+4]   ; load pointer into ES:BX

Loading the selector part of the pointer value into a segment register had the side effect of validating the selector value: if the selector did not point to a valid memory segment, an exception would be fired.

Compiling the innocent looking statement q = p; could be compiled in many different ways:

    MOV  AX,[BP+4]    ; loading via DX:AX registers: no side effects
    MOV  DX,[BP+6]
    MOV  [BP-6],AX
    MOV  [BP-4],DX

or

    LES  BX,[BP+4]    ; loading via ES:BX registers: side effects
    MOV  [BP-6],BX
    MOV  [BP-4],ES

The second option has 2 advantages:

The code is more compact, 1 less instruction
The pointer value is loaded into registers that can be used directly to dereference the memory, which can result in fewer instructions generated for subsequent statements.

Freeing the memory may unmap the segment and make the selector invalid. The value becomes a trap value and loading it into ES:BX fires an exception, also called trap on some architectures.

Not all compilers would use the LES instruction for just copying pointer values because it was slower, but some did when instructed to generate compact code, a common choice then as memory was rather expensive and scarce.

The C Standard allows for this and describes a form of undefined behavior the code where:

The value of a pointer to an object whose lifetime has ended is used (6.2.4).

because this value has become indeterminate as defined this way:

3.19.2 indeterminate value: either an unspecified value or a trap representation

Note however that you can still manipulate the value by aliasing via a character type:

/* dumping the value of the free'd pointer */
unsigned char *pc = (unsigned char*)&p;
size_t i;
for (i = 0; i < sizeof(p); i++)
    printf("%02X", pc[i]);   /* no problem here */

/* copying the value of the free'd pointer */
memcpy(&q, &p, sizeof(p));   /* no problem either */

score 0 · Answer 6 · answered Nov 14 '16 at 00:08

There are two reasons that code which examines a pointer after freeing it is problematic even if the pointer is never dereferenced:

The authors of the C Standard did not wish to interfere with implementations of the language on platforms where pointers contain information about the surrounding memory blocks, and which might validate such pointers whenever anything is done with them, whether they are dereferenced or not. If such platforms exist, code which uses pointers in violation of the Standard might not work with them.
Some compilers operate on the presumption that a program will never receive any combination of inputs that would invoke UB, and thus any combination of inputs that would produce UB should be presumed impossible. As a consequence of this, even forms of UB which would have no detrimental effect on the target platform if a compiler simply ignored them may end up having arbitrary and unlimited side-effects.

IMHO, there is no reason why equality, relational, or pointer-difference operators upon freed pointers should have any adverse effect on any modern system, but because it is fashionable for compilers to apply crazy "optimizations", useful constructs which should be usable on commonplace platforms have become dangerous.

score -1 · Answer 7 · answered Nov 03 '14 at 09:28

-1

The poor wording in the sample code is throwing you off.

It says "value of p is indeterminate", but it is not the value of p that is indeterminate, because p still has the same value (the address of a memory block which has been released).

Calling free(p) does not change p -- p is only changed once you leave the scope in which p is defined.

Instead, it is the value of what p points to that is indeterminate, since the memory block has been released, and it may as well be unmapped by the operating system. Accessing it either through p or through an aliased pointer (q) may cause an access violation.

answered Nov 03 '14 at 09:28

Igor Levicki

1,017
10
17

5

Of course the value of what p points to is indeterminate but here the topic is on the pointer p itself. The wording used in the sample is correct. Check the responses provided by others. – toto Nov 03 '14 at 11:52
I did check, that is why I wrote an answer. Indeterminate means having no definite or definable value, but p has value. Its value is the same as it was before free(p). Saying that value of p is indeterminate is wrong at least in mathematical sense. – Igor Levicki Nov 03 '14 at 18:18
3

@IgorLevicki The standard provides a definition of what 'indeterminate' means in the scope of the document (see 3.19.2 in 2501's answer). In the case of trap representations, the value cannot be determined because the act of reading/copying the value triggers an exception. – Mike Strobel Nov 03 '14 at 21:31
@Mike Strobel: Standards should not redefine common words to suit their broken definitions. The word "indeterminate" already has well-established meaning and the only way a pointer could be indeterminate aside from redefining what "indeterminate" means is if it was capable of having a value of NaN because every other numerical value assigned to a pointer variable is valid. What is not valid is dereferencing numerical values which are not mapped to and backed by actual memory. – Igor Levicki Jun 29 '15 at 10:58
@IgorLevicki: The Standard imposes no requirements whatsoever upon anything that might happen once code reaches a point where it will inevitably try to do *anything* with an invalid pointer--even copy it from one variable to another, or compare it against another pointer. Hyper-modern compilers like to exploit that freedom in strange and bizarre ways. – supercat Jun 21 '16 at 18:19
@supercat What you call "variables" (p and q) are stored in general purpose CPU registers under the hood, and the most popular x86 architecture (and probably the next most popular ARM) do not trap general purpose CPU register copying, and do not define "undefined" pointer value in hardware. Even arbitrary picked "null" value (which is a fancy way of writing 0) is a valid memory pointer from a CPU viewpoint as long as it is backed by physical memory. If you instruct C compiler to copy p to q, it will generate code to copy it, and it won't be the act of that copying that will crash the program. – Igor Levicki Nov 12 '16 at 13:32
@IgorLevicki: If the Standard imposes no requirements upon what happens if a program does something, a conforming implementation may behave in arbitrary fashion even if on a target platform where there would be a clear and logical behavior. I don't believe quality implementations should go out of their way to behave in useless fashion in such cases, but the authors of "modern" compilers like gcc and clang think otherwise. – supercat Nov 13 '16 at 00:50
@supercat I don't know what you are referring to when it comes to gcc and clang "strange and bizarre ways", but I do know that for example Intel compiler for Windows usually tries to mimic Microsoft compiler's undefined behavior. In this particular case however, there is nothing a compiler can or should do except perhaps issue a warning to the tune of "use after free()" during compilation. The behavior is undefined in the standard because the underlying hardware platform is the one defining it and I already described above how that works for x86 and ARM. – Igor Levicki Nov 13 '16 at 13:17
1

@IgorLevicki: GCC and clang will sometimes decide that if a function would invoke UB if invoked with a particular value, any conditional test which would look for that value but wouldn't prevent the UB can be omitted. For example, in gcc, `unsigned mul(unsigned short x, unsigned short y) {return x*y;}` can disturb the behavior of surrounding code in cases where the arithmetical value of the product would be between INT_MAX+1u and UINT_MAX. – supercat Nov 13 '16 at 21:37
This answer is wrong; the standard clearly says that the value of `p` is indeterminate (not the value of what `p` points to). See the bolded text in the answer by "2501" – M.M Sep 06 '18 at 03:39
@M.M. x86 asm: `sub esp, 4; mov esi, esp; sub esp, 4 mov edi, esp; push 1024; call _malloc; mov [esi], eax; mov [edi], eax; push eax; call _free;` Code allocates space on stack for 2 ptr vars (esi points to p, edi points to q), then allocates 1KB of RAM, copies ptr to mem block to said 2 ptr vars and frees said RAM. Note how after free() call eax can still hold the ptr to free'd mem block, and both stack variables p and q pointed to by esi and edi still hold the same ptr to free'd mem block. This is but one CPU architecture, but it is still wrong to redefine word "undefined". – Igor Levicki Sep 12 '18 at 15:15
@IgorLevicki the meaning of "undefined" is defined by the C Standard. Same goes for "indeterminate" . Your assembly example is completely irrelevant – M.M Sep 12 '18 at 21:39
@M.M What I am arguing here is that a good standard should not need to redefine common words to explain its basic concepts, and the existence of this question on SO proves my point. – Igor Levicki Nov 05 '18 at 09:04
@IgorLevicki so you'd prefer it invented new non-English words for its concepts? – M.M Nov 05 '18 at 09:47
@M.M No, I would just prefer if people stopped misusing common words with well defined meaning. `p` and `q` are not undefined because they still have a value after the call to `free`. I stand by the assesment that the wording in the standard is confusing and the existence of this question proves it. I can't understand how people can treat the badly worded standard as something written in stone but at the same time they accept and allow redefining the meaning of words as if language shouldn't have a standard meaning for everyone. – Igor Levicki Feb 24 '23 at 02:13
@IgorLevicki Words have specific meaning in technical contexts that differs from layman use , this is an unavoidable property of language . Because (a) most people just talk without trying to conform to a single definition of each word across all domains, (b) when there is a specialized concept in a domain, your only options are to refine the meaning of an existing word, or create a new word. You say you don't want the standard to create a new word, therefore you must be concurring that it should refine an existing word. In this case "indeterminate" . – M.M Feb 25 '23 at 03:08
You even say in your first comment "Indeterminate means having no definite or definable value" , which is the same meaning it has in the C Standard. The part where you're mistaken is claiming that `p` retains its value after being `freed`. The standard says directly that `p` becomes indeterminate , and the language is defined by the standard, not by properties of an architecture . – M.M Feb 25 '23 at 03:10
@M.M You say "Words have specific meaning in technical contexts that differs from layman use, this is an unavoidable property of language", but we are talking about (and the standard is using) the mathematical definition of the word "Indeterminate". Thus, in the mathematical sense `p` cannot become indeterminate after `free()`, because `p` is ultimately mapped to a CPU register and majority (if not all) modern architectures do not have a concept of an undefined value for a CPU register. And then there's the fact that calling `free()` does not change the value of `p` at all. – Igor Levicki Mar 08 '23 at 17:08
You say "The standard says directly that `p` becomes indeterminate, and the language is defined by the standard, not by properties of an architecture" and I ask you this -- how is that definition enforced by any of the existing C compilers? And if the definition from the standard cannot be enforced, why does it even exist? Why bother with complex and confusing language misusing mathematical terms in the process when you can just say "Thou shall not use a pointer to allocated memory block after it has been freed"? – Igor Levicki Mar 08 '23 at 17:20
@IgorLevicki the standard says nothing about CPU registers. They are irrelevant to this discussion. C is defined on an abstract machine, and the standard explicitly says that `free(p)` changes the value of `p` in the abstract machine. – M.M Mar 09 '23 at 09:07
@M.M "Indeterminate" means having no definite or definable value while `p` still has a (same) numeric value after a call to `free(p)`. Standard is therefore wrong in saying that `free(p)` changes `p` because no C implementation actually changes `p` (because that'd be ridiculous). It's simply poorly worded and it abuses common language to say something that should be said in a different way. – Igor Levicki Mar 13 '23 at 02:36
The language is defined by the standard, not by the behaviour of common compilers you have experience with . If your position is "the standard is wrong" then there is no point continuing this argument – M.M Mar 13 '23 at 02:42
Also it would not be "ridiculous" for a compiler to set `p` to some debugging value to help detect use of freed memory. I have read examples of this previously, as well as cases where the compiler optimizes the stack allocation to use the same space as `p` to store some other local variable whose usage doesn't begin until after `free(p)` . – M.M Mar 13 '23 at 02:45
C language is not something abstract that you can use in any meaningful way without a compiler. Therefore, if all C compilers that are in use nowadays are not changing `p` after `free(p)`, then the standard simply doesn't match the reality which means it is wrong. It is also wrong in how it uses the word "indeterminate", and the concept that is trying to explain could be explained much better without distorting standard mathematical definitions. – Igor Levicki Mar 16 '23 at 02:27

score -3 · Answer 8 · answered Nov 03 '14 at 13:17

An important concept to internalize is the meaning of "indeterminate" or "undefined" behavior. It is exactly that: unknown and unknowable. We would often tell students "It is perfectly legitimate for your computer to melt into a shapeless blob, or for the disk to fly off to Mars". As I read the original documentation included, I did not see any place it said to not use malloc. It merely points out that an erroneous program will fail. Actually, having the program take a memory exception is a Good Thing, because it tells you immediately that your program is defective. Why the document suggests this might be a Bad Thing escapes me. What is a Bad Thing is that on most architectures, it will NOT take a memory exception. Continuing to use that pointer will produce erroneous values, potentially render the heap unusable, and, if that same block of storage is allocated for a different use, corrupting the valid data of that use, or interpreting its values as your own. Bottom line: don't use 'stale' pointers! Or, to put it another way, writing defective code means that it won't work.

Furthermore, the act of assigning p to q is most decidedly NOT "undefined". The bits stored in the variable p, which are meaningless nonsense, are quite easily, and correctly, copied to q. All this means now is that any value that is accessed by p can now also be accessed by q, and since p is undefined nonsense, q is now undefined nonsense. So using either one of them to read or write will produce "undefined" results. If you are lucky enough to be running on an architecture that can cause this to take a memory fault, you will easily detect the improper usage. Otherwise, using either pointer means your program is defective. Plan on spending a lot of hours finding it.

No, this is wrong. `p` may be a "trap representation" such that simply copying it will be an error. — nobody, Nov 03 '14 at 13:31
@AndrewMedico: Not even NULL pointer is a "trap representation" or you would not be able to load 0 to any CPU register without triggering undefined behavior. — Igor Levicki, Nov 03 '14 at 18:41
NULL isn't, but freed pointer values may be. See http://www.ibm.com/developerworks/library/pa-ctypes3/ (linked by @BlagovestBuyukliev on 2501's excellent answer). — nobody, Nov 03 '14 at 18:52
I read it -- it says "Pointers which refer to freed memory ... become indeterminate" but it is not the pointer which becomes indeterminate because its value is known until the location holding it is overwritten. — Igor Levicki, Nov 28 '14 at 17:20
"This is to accommodate processors on which some amount of validation of addresses occurs when an address register is loaded.", char *q could be in a special register which validates any input. — QuentinUK, Sep 17 '15 at 00:03

Why does MISRA C state that a copy of pointers can cause a memory exception?

8 Answers8

Linked