2

This is somewhat of a subjective question, but it seems like there should be a standard for this. I'm making a tree-like data structure, and I want to know the best way to pass new nodes from a function. I've had several ideas, but I don't know which is safest/most efficient.

Here's a simplification of my code:

typedef struct Node {
    struct Node *left;
    struct Node *right;
    int value;
} Node;

int f() {
    //do stuff
}

Node *new_node() {
    Node n = {NULL, NULL, f()};
    return &n;
}

int main() {
    Node a = {new_node(), new_node(), 0};
}

Obviously, this doesn't work, because the pointer returned by the new_node() function points to stack-allocated data that will be released as soon as new_node() ends. But what is the best way to get around that?

One possibility would be to allocate n on the heap, like this:

Node *new_node() {
    Node *n = (Node *) malloc(sizeof(Node)); //unsure if the cast is necessary here, but that's not relevant
    n->left = NULL;
    n->right = NULL;
    n->value = f();
    return n;
}

This feels off though, because it requires that the caller function handle memory clearing directly, which could get messy fast.

Another option I've seen (especially when the object is an array or buffer, rather than an object) is to pass a pointer to the function, and just modify the contents of the pointer.

void new_node(Node *n) {
    n->left = NULL;
    n->right = NULL;
    n->value = f();
}

int main() {
    Node n = {NULL, NULL, 0};
    Node n1 = n;
    new_node(&n);
    new_node(&n1);
    Node a = {&n, &n1, 0};
}

Or you could just pass the data directly, especially since Node is so small in this case:

 Node new_node() {
     Node n = {NULL, NULL, f()}
     return n;
 }

That seems like it would be slower, though I don't know for sure.

There are a few answers about this topic already, but they are in C++ and deal with both references and pointers, which I'm afraid I don't really understand the difference between.

What is the standard way to do this?

(Sorry this if this is too long)

a52
  • 232
  • 2
  • 9

4 Answers4

5

You covered all four available approaches, including one that does not work. Picking among them depends on your design preferences:

  • You correctly noted that the first approach is broken
  • Second approach is good in situations when the objects you create cannot be allocated statically, for example, because you do not know until runtime how many of them you are going to need
  • Third approach is good for initializing statically allocated objects, because it does not require copying, and it lets you avoid manual resource management
  • The last approach is less common because of the desire to avoid copying. This almost universally amounts to premature optimization, so this approach is perfectly valid as well.
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
3

Since you asked, Do I cast the result of malloc? No.

As for your concern all methods have their pros and cons, so pick one for your project and stay committed to that decision (do not use one approach for one function and then another for another function of the tree).

The ownership can be either of the functions' (but the caller has to call the one that will de-allocate the tree), or of the caller himself.

In libraries for example, I would expect them to handle the memory allocations of their data-structures, and then me simply calling a function provided by the library itself to de-allocate the space. That way I am not getting into their internal details, which is good.

Usually I am giving ownership to the functions, but this is subjective.

Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
1

The basic decision here is who controls the memory. Do you let the caller decide and dump more work on them? Or do you handle it for them?

Why not both? Taking a page from OO constructor patterns, separate the creation from the initialization.

void *Node_init( Node *node ) {
    node->left = NULL;
    node->right = NULL;
    node->value = 0;
}

Node *Node_new() {
    Node *node = malloc(sizeof(Node));

    Node_init(node);

    return node;
}

void *Node_free( Node *node ) {
    free(node);
}

Note that I don't hard code any sort of function call initialization of node->value because that would limit the flexibility of these Node functions, it's just set to 0 to ensure it's not filled with garbage. And because the caller writing node->value = ... is really trivial.

Now you can do both.

int main(void) {
    Node my_node;
    Node_init(my_node);

    Node *your_node = Node_new();

    // Use them as you like

    Node_free(your_node);
}

HOWEVER! The problem of who controls the memory might cause problems down the road with a connected tree structure like this. If you want to destroy the whole tree, you'll have to crawl it and destroy all the nodes. But if you don't know who owns a node's memory, that cleanup will be difficult.

So, for a tree or graph, you should probably simplify things by allowing just one memory pattern. Heap memory is more generically applicable, especially in larger programs, so I'd go with that.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • This is a good idea, but like you said, it could get a bit complicated with tree structures. I'll keep it in mind! – a52 May 11 '17 at 18:59
  • This ended up being the closest to what I used. I really like the idea of OO-based creation and deletion functions; they're so much nicer than writing malloc() and free() every time. – a52 May 12 '17 at 17:07
1

If you have a known number of nodes, there wouldn't be any need for a linked list. That means you will need to dynamically allocate the nodes in practical scenarios. The first approach therefore requires the less boilerplate code.

Node* node = new_node();
if (node == NULL)
   goto error;

However, that approach forces one to use malloc instead of an another allocator.[1] The second approach affords more flexibility[2], and the extra code needed is rather minimal. Taking the second approach makes the library far more reusable.

Node* node = malloc(sizeof(Node));
if (node == NULL)
   goto error;

init_node(node);

I don't see any advantage to the third approach.


  1. For example, a Perl extension should use Perl's memory allocator rather than malloc. For example, statically-allocated memory is strongly preferred over dynamically-allocated memory in embedded systems.

  2. It can be used with both statically- and dynamically-allocated structures, and you can use an allocator other than malloc.

ikegami
  • 367,544
  • 15
  • 269
  • 518