How to properly recurse and update values within a self-referential struct?

Question

I have a linked-list like chain of structs that point to eachother. Normally it wouldn't be so difficult, but I'm running into problems freeing memory that has already been freed, because different structs can point to the same struct down below. Now, I am not referring to freeing a node itself, but rather an element within the node (a matrix). The issue is not that I am freeing an object that I am referencing later, it is that I am freeing a member of an object multiple times. It looks like this:

      d
     / \
    c   b
  /   \
 a     b

where if I just recurse through the graph, I'll end up freeing a member of b twice. Thus, I need a way to update b's state with some boolean value done after I free it's member once so that when I come across it a second time I can avoid freeing something that has already been freed.

The structure obj is acting as a sort of node within this chain. The goal is for prev to be an array of previous nodes (order doesn't matter). If a node a has a previous node prev b, where b also has prev set to say, c, the purpose is to be able to recurse through until you find all nodes that have no prev set (effectively a leaf node). The issue is that, the way this is currently implemented, updating the states of any nodes that are stored in prev does not actually update the value of the initial node - instead, it just updates the value of the node within obj->prev.

I have some test code for the scenario that I am dealing with:

#include <stdio.h>
#include <stdlib.h>

typedef struct obj obj;
struct obj {
    char name;
    int done;
    obj* prev;
};

void recurse(obj *o) {
    
    printf("(before) node %c: %i\n", o->name, o->done);
    o->done = 0;
    printf("(after) node %c: %i\n", o->name, o->done);

    if (o->prev) {
        // *this* should be generalized to handle for n elems in 
        // prev, and can be handled by just adding a num_prev value 
        // to obj
        recurse(o->prev);
        recurse(o->prev+1);
    }
}

int main() {

    obj node1 = {'a', 1, NULL};
    obj node2 = {'b', 1, NULL};

    obj node3 = {'c', 0, NULL};
    node3.prev = (obj *)malloc(2 * sizeof(obj));
    *node3.prev = node1;
    *(node3.prev+1) = node2;

    obj node4 = {'d', 0, NULL};
    node4.prev = (obj *)malloc(2 * sizeof(obj));
    *node4.prev = node3;
    *(node4.prev+1) = node2;

    recurse(&node4);
    printf("(final) node %c: %i\n", node1.name, node1.done);
    printf("(final) node %c: %i\n", node2.name, node2.done);

    free(node3.prev);
    free(node4.prev);
}

where the output ends up being:

(before) node d: 0
(after) node d: 0
(before) node c: 0
(after) node c: 0
(before) node a: 1
(after) node a: 0
(before) node b: 1
(after) node b: 0
(before) node b: 1
(after) node b: 0
(final) node a: 1
(final) node b: 1

As you can see through the results, two things are happening. First, b (node2) is getting recursed twice, but the second time the value is still set to 1 (even though it was just set to 0). The second thing is that after the whole program execution, a (node1) and b (node2) are both not updated.

I think it's something like: I'm only updating the local (copied) pointer of a->val and not the actual pointer of a (I think that sounds right).

You can't update C after you free it because after you free it it no longer exists. Perhaps that wasn't quite what you meant. — rici, Nov 01 '20 at 22:17
Why would you expect `a->val` to change? `c->prev` is `b`, not `a` — Daniel Walker, Nov 01 '20 at 22:18
Usually, you don't want to free 'c' if some-one else is still using it. For this reason, you should keep a `counter` of references in side, so you only free it if zero. — Adrian Maire, Nov 01 '20 at 22:19
I guess I didn't give enough detail. I store a matrix within each struct. I'm not freeing `c`, but rather the matrix within `c`. — traw1233, Nov 01 '20 at 22:21
@Daniel_Walker .. regarding casting malloc.. please not [this war](https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc) again. "Bad practice" is in the eye of the beholder. I can see if from both sides. Apparently some can not. — Señor CMasMas, Nov 01 '20 at 22:25
@traw1233 With matrices bolted on at the end of the question, your question is unclear. If you edit your whole question to incorporate matrices that would help. — Schwern, Nov 01 '20 at 22:28
@traw1233 An `obj` cannot have multiple parents because there can be only one value for `obj->prev`. If you want a node to have multiple parents, `obj->prev` needs to be an array `obj** prev`. — Schwern, Nov 01 '20 at 22:42
@Schwern I think that could be the issue, thanks. So would that mean updating the struct definition to `obj* prev;`? And what would actually allocating two elements to this array look like (my double pointer knowledge isn't fantastic)? It is important that I can malloc this array as it needs to be a dynamic size. — traw1233, Nov 01 '20 at 22:44
@traw1233 Maybe you should explain what this structure is supposed to be doing. — Schwern, Nov 01 '20 at 22:46
_side note:_ in your last example, doing `c->prev = (obj *)malloc(sizeof(obj));;` is leaking memory because you do: `c->prev = a;` immediately afterwards. — Craig Estey, Nov 01 '20 at 22:50
CraigEstey that makes sense, I don't think it's an actual solution. @Schwern I edited the post to give context on the purpose of the struct and the goal. — traw1233, Nov 01 '20 at 22:54
If you're worried about freeing node `c` twice, you can add a refcount to the nodes. Then, use a custom free routine like: `void nodefree(node *c) { c->refcount -= 1; if (c->refcount <= 0) free(c); }` — Craig Estey, Nov 01 '20 at 22:55
That is precisely what I am trying to do @CraigEstey, the issue is that (in my current code) `refcount` doesn't get updated globally to the main object. Instead, the temporary pointer within `prev` gets updated but does not carry through to any other pointer to that `node`. I'm working on a code example to recreate this behavior now. — traw1233, Nov 01 '20 at 22:56
Can you post [enough of] your _real_ code? At present, there is no refcount in the struct. Do you have an MRE that can create the bad hierarchy and then try to free it [all]? You're showing a tree hierarchy, but your struct only has `prev`. Wouldn't it need (e.g.) `left` and `right`? — Craig Estey, Nov 01 '20 at 23:02
And, now that I think of it, even the refcount doesn't work too well if `c` is _not_ a leaf node. Are you interested in a "deep copy" approach that dups/clones a given subtree [recursively]? Assuming we have `c` at depth 2, how do we get `c` at depth 3 anyway [that has the _same_ address as the depth 2 node]? — Craig Estey, Nov 01 '20 at 23:09
@CraigEstey I just posted updated code, I'm pretty sure *this* one accurately represents the issue. I'm really trying to avoid any sort of deep copies, I'm actually trying to do kind of the opposite (i.e. update the actual node that the pointer is pointing to instead of a copied version of that node). My solution has just been a simple boolean `done` and I think it works for the kind of graph I'm going for. — traw1233, Nov 01 '20 at 23:13

Craig Estey · Accepted Answer · 2020-11-02T00:32:07.537

Thanks for the final update.

A few things. Unless you have complexities beyond what's in your posted code, doing a malloc on obj->prev isn't needed. You can just do: obj *prev[2] instead.

Also, I'd use a custom allocate function to simplify.

The real issue seems to be not using done correctly. When created, all nodes should set it to zero. Then, recurse should check done and return immediately if done is set. Then, recurse should set done.

This may not be what you're thinking of, but I think it will get you closer.

I've refactored your code to illustrate:

#include <stdio.h>
#include <stdlib.h>

typedef struct obj obj;
struct obj {
    char name;
    int done;
    obj *link[2];
};
#define left    link[0]
#define right   link[1]

#ifdef DEBUG
#define dbgprt(_fmt...)         printf(_fmt)
#else
#define dbgprt(_fmt...)         /**/
#endif

void
recurse(obj *o)
{

    if (o->done)
        return;

    dbgprt("recurse: ENTER '%c'\n",o->name);

    printf("(before) node %c: %i\n", o->name, o->done);
    o->done = 1;
    printf("(after) node %c: %i\n", o->name, o->done);

#if 0
    if (o->prev) {
        // *this* should be generalized to handle for n elems in
        // prev, and can be handled by just adding a num_prev value
        // to obj
        recurse(o->left);
        recurse(o->right);
    }
#else
    if (o->left)
        recurse(o->left);
    if (o->left)
        recurse(o->right);
#endif

    dbgprt("recurse: EXIT '%c'\n",o->name);
}

obj *
newnode(char name,int done)
{
    obj *node = calloc(1,sizeof(obj));
    node->name = name;
#if 0
    node->done = done;
#endif
    return node;
}

int
main()
{

    obj *node_a = newnode('a',1);
    obj *node_b = newnode('b',1);
    obj *node_c = newnode('c',0);

    node_c->left = node_a;
    node_c->right = node_b;

    obj *node_d = newnode('d',0);
    node_d->left = node_c;
    node_d->right = node_b;

    recurse(node_d);

    printf("(final) node %c: %i\n", node_a->name, node_a->done);
    printf("(final) node %c: %i\n", node_b->name, node_b->done);

    return 0;
}

Here's the output:

(before) node d: 0
(after) node d: 1
(before) node c: 0
(after) node c: 1
(before) node a: 0
(after) node a: 1
(before) node b: 0
(after) node b: 1
(final) node a: 1
(final) node b: 1

UPDATE:

Ah yes okay this is far closer, I think this makes a lot more sense. The big thing that's missing is I need to be able to define n children on obj->link on demand. Sometimes it could have one child, sometimes it could have 5. That was why I was using the malloc + a pointer. Is there any way you could imagine on how to avoid doing link[2] and being able to define the number of children when instantiating the struct?

If, when you create the parent, you know the number of children, you can do a single calloc for link. This is less allocating, but you may be allocating child pointers that you don't use.

#include <stdio.h>
#include <stdlib.h>

typedef struct obj obj;
struct obj {
    char name;
    int done;

    int ncld;
    obj **link;
};

void
recurse(obj *o)
{
    obj *cld;

    if (o == NULL)
        return;
    if (o->done)
        return;

    printf("(before) node %c: %i\n", o->name, o->done);
    o->done = 1;

    for (int idx = 0;  idx < o->ncld;  ++idx) {
        cld = o->link[idx];
        recurse(cld);
    }

    printf("(after) node %c: %i\n", o->name, o->done);
}

obj *
newnode(char name,int ncld)
{
    obj *node = calloc(1,sizeof(obj));

    node->name = name;

    node->ncld = ncld;
    node->link = calloc(ncld,sizeof(obj **));

    return node;
}

void
addchild(obj *par,obj *cld)
{

    for (int idx = 0;  idx < par->ncld;  ++idx) {
        if (par->link[idx] == NULL) {
            par->link[idx] = cld;
            break;
        }
    }
}

int
main()
{

    obj *node_a = newnode('a',2);
    obj *node_b = newnode('b',2);

    obj *node_c = newnode('c',2);
    addchild(node_c,node_a);
    addchild(node_c,node_b);

    obj *node_d = newnode('d',2);
    addchild(node_d,node_c);
    addchild(node_d,node_b);

    recurse(node_d);

    printf("(final) node %c: %i\n", node_a->name, node_a->done);
    printf("(final) node %c: %i\n", node_b->name, node_b->done);

    return 0;
}

If you'd like to add an arbitrary number of children to a given node and you do not know the limit when you allocate the parent node, you can do this.

It has the virtue of never allocating more than you need but, do realloc every time a child is added to a parent can be expensive. This can be alleviated somewhat by having a cache of free/unused nodes in an allocation pool [not shown].

#include <stdio.h>
#include <stdlib.h>

typedef struct obj obj;
struct obj {
    char name;
    int done;

    int ncld;
    obj **link;
};

void
recurse(obj *o)
{
    obj *cld;

    if (o == NULL)
        return;
    if (o->done)
        return;

    printf("(before) node %c: %i\n", o->name, o->done);
    o->done = 1;

    for (int idx = 0;  idx < o->ncld;  ++idx) {
        cld = o->link[idx];
        recurse(cld);
    }

    printf("(after) node %c: %i\n", o->name, o->done);
}

obj *
newnode(char name)
{
    obj *node = calloc(1,sizeof(obj));

    node->name = name;

    return node;
}

void
addchild(obj *par,obj *cld)
{
    int idx;

    idx = par->ncld++;
    par->link = realloc(par->link,sizeof(obj **) * par->ncld);
    par->link[idx] = cld;
}

int
main()
{

    obj *node_a = newnode('a');
    obj *node_b = newnode('b');

    obj *node_c = newnode('c');
    addchild(node_c,node_a);
    addchild(node_c,node_b);

    obj *node_d = newnode('d');
    addchild(node_d,node_c);
    addchild(node_d,node_b);

    recurse(node_d);

    printf("(final) node %c: %i\n", node_a->name, node_a->done);
    printf("(final) node %c: %i\n", node_b->name, node_b->done);

    return 0;
}

For what it's worth, there is a significant amount of complexity outside of just this functionality and it pretty much has to be generalized

Fair enough. If it's complex enough, the child link could, itself, be an indirect linked list instead of an array of pointers.

Ah yes okay this is far closer, I think this makes a lot more sense. The big thing that's missing is I need to be able to define `n` children on `obj->link` on demand. Sometimes it could have one child, sometimes it could have 5. That was why I was using the malloc + a pointer. Is there any way you could imagine on how to avoid doing `link[2]` and being able to define the number of children when instantiating the struct? For what it's worth, there is a significant amount of complexity outside of just this functionality and it pretty much has to be generalized. — traw1233, Nov 01 '20 at 23:50
Luckily I know the number of children before they're already added, so I don't have to go down the realloc path. But thanks, this is perfect! — traw1233, Nov 02 '20 at 13:55

score 0 · Answer 2 · answered Nov 02 '20 at 00:14

Your structure is better drawn like so...

      a
     / \
    b  |
  /   \|
 d     c

An obj cannot have multiple parents because there can be only one value for obj->prev. I believe you're trying to solve this by allocating space for multiple objs within an obj.

If you want an obj to have multiple prevs, obj->prev needs to be an array of obj *. obj** prevs.

Thus, I need a way to update c's state with some boolean value done once I free it once so that when I come across it a second time I can avoid freeing something that has already been freed.

You have the opposite problem. You shouldn't free something which is still being referenced. If you do, the other references will be invalid.

If you want to free a, a needs to know if there's anything pointing to it. But a cannot see its children.

You could solve this with reference counting. Add a ref_count to your struct and increment or decrement it as children are added or removed.

typedef struct obj {
    // I've changed to a char for illustration.
    char value;
    // A list of previous obj*
    struct obj** prevs;
    // How many prevs are there?
    size_t num_prevs;
    // How much space is allocated for prevs?
    size_t max_prevs;
    // How many other objs refer to this?
    size_t ref_count;
} obj;

obj *obj_init(char value) {
  obj *node = malloc(sizeof(obj));

  node->prevs = NULL;
  node->num_prevs = 0;
  node->max_prevs = 0;
  node->ref_count = 0;
  node->value = value;

  return node;
}

Now we can make a struct with no prevs. When we add a prev, we need to allocate or grow prevs before adding a prev.

void obj_extend_prevs(obj *node) {
  // No memory is allocated. Allocate a slot.
  if( node->prevs == NULL ) {
    node->prevs = malloc(sizeof(*node));
    node->max_prevs = 1;
    node->num_prevs = 0;
  }
  // We're out of space. Double it.
  else if( node->num_prevs >= node->max_prevs ) {
    // Memory is allocated. Double it.
    node->max_prevs *= 2;
    node->prevs = realloc(node->prevs, sizeof(*node) * node->max_prevs);
  }
  // else we have enough space, do nothing
}

And when we add a prev, we increment the ref_count of the prev.

void obj_add_prev(obj *node, obj *prev) {
  // Extend or allocate node->prevs if needed
  obj_extend_prevs(node);

  node->prevs[node->num_prevs] = prev;
  node->num_prevs++;
  prev->ref_count++;
}

It's only safe to free a prev if it has nothing referring to it, and it refers to nothing.

void obj_try_free(obj *node) {
    // nothing refers to it, and it refers too nothing.
    if( node->ref_count == 0 && node->num_prevs == 0 ) {
        printf("Freeing %c.\n", node->value);
        free(node);
    }
}

We remove a prev by rewriting node->prevs without the prev. Then we can see if the node can be freed.

void obj_remove_prev(obj *node, obj *prev) {
  // Rewrite node->prevs without prev.
  int i = 0;
  int j = 0;
  for( ; i < node->num_prevs; i++ ) {
      if( node->prevs[i] == prev ) {
          continue;
      }

      node->prevs[j] = node->prevs[i];
      j++;
  }

  // Subtract how many we removed.
  // prev may have been there twice or not at all.
  node->num_prevs -= i - j;
  prev->ref_count--;

  obj_try_free(node);
}

And a handy function to print nodes.

void obj_print(obj *node) {
    printf("%c has %zu refs", node->value, node->ref_count);

    if( node->prevs == NULL ) {
        puts(" and no prevs");
    }
    else {
        printf(" and prevs");
        for( int i = 0; i < node->num_prevs; i++ ) {
            printf(" %c", node->prevs[i]->value);
        }
        puts("");
    }
}

And here's a quick demo.

// Set up your data structure.
obj *a = obj_init();

obj *b = obj_init();
obj_add_prev(b, a);

obj *c = obj_init();
obj_add_prev(c, a);
obj_add_prev(c, b);

obj *d = obj_init();
obj_add_prev(d, b);

// Print each node.
obj_print(a);  // a has 2 refs and no prevs
obj_print(b);  // b has 2 refs and prevs a
obj_print(c);  // c has 0 refs and prevs a b
obj_print(d);  // d has 0 refs and prevs b

// Remove d's parent b, it will be freed.
obj_remove_prev(d, b);  // Freeing d.

obj_print(a);  // a has 2 refs and no prevs
obj_print(b);  // b has 1 refs and prevs a
obj_print(c);  // c has 0 refs and prevs a b
// d has been freed.

// Remove one of c's parents, it will not be freed.
obj_remove_prev(c, a);

obj_print(a);  // a has 1 refs and no prevs
obj_print(b);  // b has 1 refs and prevs a
obj_print(c);  // c has 0 refs and prevs b

Note that reference counting is not perfect and can lead to memory leaks if there are circular references. For example, if a points to b points to c points to a.

Note that in production code you would use a pre-existing pointer array library which can grow and shrink and add and remove. But it's good to learn to do it by hand.

Another approach is to make a two-way graph. Each node tracks both its parents and children. Its list of children replaces the need for a ref_count. This requires a bit more management, but it also allows you to walk both ways through your graph.

I'll leave that as an exercise.

score 0 · Answer 3 · answered Nov 02 '20 at 01:15

You are confusing pointers and pointed at values. When you have code like:

*node3.prev = node1;
*(node3.prev+1) = node2;

This makes COPIES of node1 and node2 -- it does not make things point at node1 and node2. So when you say

First, b (node2) is getting recursed twice, but the second time the value is still set to 1

you are wrong. node2 is not getting recursed at all -- instead you are recursing to the two copies you have made of node2. Both of these copies have 'b' as their name, and have done = 1 initially, as they are copies of the unchanged original node2. Changing the done flag in either of these does not affect the done flag in the other, or the done flag in node2 (as you notice). node2 itself is never visited and never modified.

How to properly recurse and update values within a self-referential struct?

3 Answers3