11

If I define a structure...

struct LinkNode
{
  int node_val;
  struct LinkNode *next_node;
};

and then create a pointer to it...

struct LinkNode *mynode = malloc(sizeof(struct LinkNode));

...and then finally free() it...

free(mynode);

...I can still access the 'next_node' member of the structure.

mynode->next_node

My question is this: which piece of the underlying mechanics keeps track of the fact that this block of memory is supposed to represent a struct LinkNode? I'm a newbie to C, and I expected that after I used free() on the pointer to my LinkNode, that I would no longer be able to access the members of that struct. I expected some sort of 'no longer available' warning.

I would love to know more about how the underlying process works.

Ducain
  • 1,581
  • 3
  • 18
  • 27
  • Would you expect that the memory evaporates on calling free ? Is is still there (or not) or it could have a different meaning. – wildplasser May 09 '13 at 17:28
  • @wildplasser are you being snarky there? I'm trying to understand not just the language syntax but why it works. Of course the memory doesn't evaporate. But when I write *pointer->member, I'm not sure how the mechanics of that works, and why it still works after a pointer is free()d. Hope that makes sense. – Ducain May 09 '13 at 17:32
  • 1
    The pointer still has the same value. After calling free() you abandoned the memory: you told malloc/free that you don't want to use it any more. Compare it to a telephone number: After I quit my telephone-attachment, my number is no longer valid. But you can still try to dial it. It might even be me answering the phone. Or Noise. Or somebody completely different. The number (=address) is still there, but using it is not valid anymore. It could point to the controls of a nuclear powerplant ... – wildplasser May 09 '13 at 17:37
  • Unfortunately I think the problem is that my question isn't clear enough. I understand the basics of what you're saying relating to free. What I don't understand, is how structure members are accessed, and how that relates to free, or if it does at all. I'll see if I can make my question better. Thanks much. – Ducain May 09 '13 at 17:40
  • 1
    Eric Lippert's [analogy](http://stackoverflow.com/a/6445794/25507) is appropriate here. – Josh Kelley May 09 '13 at 17:42
  • This has been asked many, many, many times before. Duplicate of [Why freed struct in C still has data?](http://stackoverflow.com/questions/2960064/why-freed-struct-in-c-still-has-data) – AnT stands with Russia May 09 '13 at 17:44
  • 2
    WRT adressing the structure members: check the (assembler) output `gcc -S` to see how they work. `p->next` is essentially translated to `p + some_offset`. And that code (and offset) is of course the same after `p` has been freed. But it is invalid, since after freeing, `p` does not refer to a valid object anymore. – wildplasser May 09 '13 at 17:47
  • @wildplasser - that is exactly what I was looking for. Sorry my question phrasing didn't indicate that's really what I was after. Thank you. – Ducain May 09 '13 at 17:50
  • 3
    There is an additional detail: before c89/ANSI (on some/most unix platforms) it used to be a *requirement* that the pointer (or the object where it used to points to) could still be used after the free() , given no intervening malloc/free calls. – wildplasser May 09 '13 at 17:56
  • Kudos for pointing out the -S option for the GCC compiler. Just used it, and though I don't understand assembly, that is an amazing resource to start looking at this stuff further. Wow. – Ducain May 09 '13 at 18:34

6 Answers6

10

The compiled program no longer has any knowledge about struct LinkedNode or field named next_node, or anything like that. Any names are completely gone from the compiled program. The compiled program operates in terms of numerical values, which can play roles of memory addresses, offsets, indices and so on.

In your example, when you read mynode->next_node in the source code of your program, it is compiled into machine code that simply reads the 4-byte numerical value from some reserved memory location (known as variable mynode in your source code), adds 4 to it (which is offset of the next_node field) and reads the 4-byte value at the resultant address (which is mynode->next_node). This code, as you can see, operates in terms of integer values - addresses, sizes and offsets. It does not care about any names, like LinkedNode or next_node. It does not care whether the memory is allocated and/or freed. It does not care whether any of these accesses are legal or not.

(The constant 4 I repeatedly use in the above example is specific for 32-bit platforms. On 64-bit platforms it would be replaced by 8 in most (or all) instances.)

If an attempt is made to read memory that has been freed, these accesses might crash your program. Or they might not. It is a matter of pure luck. As far as the language is concerned, the behavior is undefined.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
5

There isn't and you can't. This is a classic case of undefined behavior.

When you have undefined behavior, anything can happen. It may even appear to work, only to randomly crash a year later.

Antimony
  • 37,781
  • 10
  • 100
  • 107
  • 1
    This doesn't really answer the question that was asked, specifically why the relative positioning of the struct members is still valid, even if the result of accessing them is undefined. – Chris Stratton May 09 '13 at 17:54
  • @ChrisStratton: If you remove the lock from a public locker but leave your stuff inside, and then some time later you come back and look inside, it's possible your stuff will still be there. Or it might be some stuff that looks like yours but actually belongs to someone else who has similar taste. If you happen to find your stuff in the locker exactly as you last arranged it, the only "reason" it would be there arranged like that is that nobody as yet had happened to do anything else with the space. – supercat May 09 '13 at 18:05
  • 1
    The aspect this overlooks is that the locker 3rd to the right of yours is still the locker 3rd to the right of the one you had, even if it's no longer yours. Also, in the comments its pointed out that the behavior is actually **not always** undefined. – Chris Stratton May 09 '13 at 18:08
  • @Chris Sorry, I didn't understand precisely what the asker wanted. Presumably he comes from a language like Python where fields don't (usually) have a fixed offset. – Antimony May 09 '13 at 18:10
  • @Antimony you're absolutely correct that my question wasn't clear enough, though even now I'm not sure how I'd improve it. My day job is 50% in .NET languages and 50% in SQL. I'm learning C on my own, because I've never had it, working through K&R. Hit a roadblock with this stuff today. Thanks. – Ducain May 09 '13 at 18:24
  • 1
    @Ducain, .net actually has the fixed offset field stuff too, it just hides it from you. You can even declare raw structs in C#. – Antimony May 09 '13 at 18:48
5

It works by pure luck, because the freed memory has not yet been overwritten by something else. Once you free the memory, it is your responsibility to avoid using it again.

Mike Woolf
  • 1,210
  • 7
  • 11
2

No part of the underlying Memory keeps track of it. It's just the semantics the programming language gives to the chunk of memory. You could e.g. cast it to something completely different and can still access the same memory region. However the catch here is, that this is more likely to lead to errors. Especially type-safty will be gone. In your case just because you called free doesn't mean that the underlying memory canges at all. There is just a flag in your operating system that marks this region as free again.

Think about it this way: the free-function is something like a "minimal" memory management system. If your call would require more than setting a flag it would introduce unneccessary overhead. Also when you access the member you (i.e. your operating system) could check if the flag for this memory region is set to "free" or "in use". But that's overhead again.

Of course that doesn't mean it wouldn't make sense to do those kind of things. It would avoid a lot of security holes and is done for example in .Net and Java. But those runtimes are much younger than C and we have much more ressources these days.

Mene
  • 3,739
  • 21
  • 40
2

When your compiler translates your C code into executable machine code, a lot of information is thrown away, including type information. Where you write:

 int x = 42;

the generated code just copies a certain bit pattern into a certain chunk of memory (a chunk that might typically be 4 bytes). You can't tell by examining the machine code that the chunk of memory is an object of type int.

Similarly, when you write:

if (mynode->next_node == NULL) { /* ... */ }

the generated code will fetch a pointer sized chunk of memory by dereferencing another pointer-sized chunk of memory, and compare the result to the system's representation of a null pointer (typically all-bits-zero). The generated code doesn't directly reflect the fact that next_node is a member of a struct, or anything about how the struct was allocated or whether it still exists.

The compiler can check a lot of things at compile time, but it doesn't necessarily generate code to perform checks at execution time. It's up to you as a programmer to avoid making errors in the first place.

In this specific case, after the call to free, mynode has an indeterminate value. It doesn't point to any valid object, but there's no requirement for the implementation to do anything with that knowledge. Calling free doesn't destroy the allocated memory, it merely makes it available for allocation by future calls to malloc.

There are a number of ways that an implementation could perform checks like this, and trigger a run-time error if you dereference a pointer after freeing it. But such checks are not required by the C language, and they're generally not implemented because (a) they would be quite expensive, making your program run more slowly, and (b) checks can't catch all errors anyway.

C is defined so that memory allocation and pointer manipulation will work correctly if your program does everything right. If you make certain errors that can be detected at compile time, the compiler can diagnose them. For example, assigning a pointer value to an integer object requires at least a compile-time warning. But other errors, such as dereferencing a freed pointer, cause your program to have undefined behavior. It's up to you, as a programmer, to avoid making those errors in the first place. If you fail, you're on your own.

Of course there are tools that can help. Valgrind is one; clever optimizing compilers are another. (Enabling optimization causes the compiler to perform more analysis of your code, and that can often enable it to diagnose more errors.) But ultimately C is not a language that holds your hand. It's a sharp tool -- and one that can be used to build safer tools, such as interpreted languages that do more run-time checking.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • An optimizer can also help find bugs in the sense that it may break buggy code. Code that appears to work without optimization may be more obviously broken when compiled with optimization. – Antimony May 09 '13 at 18:13
  • @Antimony: Yes, if the code has undefined behavior, the compiler can generate whatever code it likes. Optimization can affect the decisions it makes, changing the actual behavior and revealing the bug. – Keith Thompson May 09 '13 at 18:15
  • Actually, in C there is not typically a check for null when dereferencing a pointer. Instead, it is blindly attempted - though on many modern systems will result in a fault in the memory protection unit. And no, the value of mynode is **not** indeterminate. It still points where it used it, it just may be that the memory there may now be being used for something else. – Chris Stratton May 09 '13 at 18:24
  • @ChrisStratton: Formally speaking, the pointer value does become indeterminate. ISO C11 standard, section 6.2.4p2 says: " The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime." On the machine level it will probably retain its same bit pattern and point to the same (virtual or physical) memory location, but the language standard doesn't promise that. – Keith Thompson May 09 '13 at 18:32
  • @KeithThompson: I suspect that's to allow for the possibility that an implementation could have pointers that consist of something that identifies a block of memory along with an offset to that block. Such a design would be allowable by the standard, and would allow for trapping of many types of errant pointer access. If one didn't mind bloating pointer types out to 128 bits, it could probably consistently trap on use of out-of-scope pointers (assuming a 64-bit counter would never overflow). – supercat May 09 '13 at 20:55
1

You need to assign NULL to mynode->next_node:

mynode->next_node = NULL;

after freeing the memory so it will indicate that you are not using anymore the memory allocated.

Without assigning the NULL value, it is still pointing to the previously freed memory location.