0

EDIT: I think my question is completely different from the proposed duplicate. It's asking about the general case whereas my question is asking for a very specific case where the reason for the weird behavior should be traceable given how specific it is.

I have some really weird behavior in my doubly LL implementation. Basically, if I pop() (from the head) an element and then inject() (add at the tail) some other element, that last tail element now points to the head of the list for seemingly no reason (I guess instead of NULL by default or at least a random address).

I figured out how to fix the problem. When injecting, I wasn't pointing the new node's "next" to NULL.

However, I would still like to understand why the injected node would choose to point to the head without specific direction of where to point.

The effect is that if I travel the list starting from the head (but not starting from the tail), I keep looping forever as the last tail element points back to the head of the list.

EDIT: So I tried printing out the address that the pointer is pointing to just after the call to malloc in inject(), and for some crazy reason the pointer is created already pointing to the head's address; but this only happens if I call pop() before calling inject(). Incredibly weird...

int pop()
{
    node* temp = head;
    int value = temp->value;    
    head = temp->next;
    free(temp);
    head->previous = NULL;
    size--;
    return value;
}

void inject(int value)
{
    if (tail == NULL)
    {               
        tail = malloc(sizeof(node));
        tail->value = value;
        tail->next = NULL;
        tail->previous = NULL;
        head = tail;
        size++;
    }
    else
    {
        node* new_node = malloc(sizeof(node));
        printf("pointing to: %p\n", new_node->next);// points to head after pop() call
        new_node->value = value;
        tail->next = new_node;
        new_node->previous = tail;
        tail = new_node;
        //new_node->next = NULL;
        size++;
    }
}

The commented out line in inject() solves the problem but still doesn't explain why the tail would point back to the head if I inject after a pop.

Below is the code before main() in case:

typedef struct node{
    int value;      
    struct node* next;
    struct node* previous;      
}node;

node* head = NULL;
node* tail = NULL;

int head_value();
int tail_value();
void push(int);
int pop();
void inject(int);
int eject();

int size = 0;
jeremy radcliff
  • 1,049
  • 3
  • 11
  • 27
  • 2
    This is called Undefined Behaviour. If a variable is uninitialized it can contain any (semi)random value. There is no point trying to understand exactly how it happened. The result is unpredictable and can change depending on the compiler, the compiler options, the surrounding code, the weather, etc. – kaylum Feb 08 '17 at 21:45
  • Possible duplicate of [Does "Undefined Behavior" really permit \*anything\* to happen?](http://stackoverflow.com/questions/32132574/does-undefined-behavior-really-permit-anything-to-happen) – kaylum Feb 08 '17 at 21:46
  • @kaylum, sure i understand that, but clearly it can't be a coincidence that it happens to point to the head of the list; if it were really random allocation, the probability would be incredibly small. So there must be a good reason why it's pointing at the head specifically, and I'm sure that reason is traceable. – jeremy radcliff Feb 08 '17 at 22:55
  • @kaylum, I think the question you linked as a duplicate is completely different. It's asking about the general case whereas my question is asking for a very specific case where the reason for the weird behavior should be traceable given how specific it is. – jeremy radcliff Feb 08 '17 at 22:58
  • @kaylum, I would really appreciate if you could review your assessment of my question being a duplicate based on what I wrote. I think the questions are completely different and it's unfair to close mine. – jeremy radcliff Feb 08 '17 at 23:02
  • So you want us to define something that is Undefined Behaviour? No one can answer your exact question for you based on the info you have provided (besides being fruitless). It depends on what assembly your compiler generated. "it can't be a coincidence" - why not? Your `next` pointer is not explicitly set so the memory or register contains a value of what it was last used for - which is coincidentally the head pointer value. Of course if you looked at the assembly you will find out which exact instruction loaded that value. Is that forensic analysis really what you want to spend your time on? – kaylum Feb 08 '17 at 23:26
  • @kaylum, I guess I was looking for an explanation along the lines of what you last said. It does kind of make sense that given no specific location to point to, the pointer would be initialized with the last memory address used by some register. Sorry for being pushy; I think I'm realizing that this is as you said a lot more context and architecture dependent than I originally assumed. Maybe the reason is not as readily traceable as I thought; I was just dumbfounded when I saw that the newly created pointer was seemingly out of nowhere pointing to the head. – jeremy radcliff Feb 08 '17 at 23:31

1 Answers1

1
    node* new_node = malloc(sizeof(node));
    printf("pointing to: %p\n", new_node->next);// points to head after pop() call

new_node->next will contain whatever garbage malloc wants to put in there. It might happen to point to head, but you never initialized it, so that printf is trying to find meaning in garbage.

Your code scatters memory management all over the place. Rather than try to fix it, let's rewrite it using my stock advice about structs: always write functions to initialize and destroy them. Always, even if it seems silly and trivial. It avoids scattering that code all over the place, doing it slightly differently every time. It allows you to unit test the basic functions of the struct before trying to use it.It lets you focus on the algorithm, not the memory management.


First, let's make a tweak to your struct. node is a very bad name for a type. It's likely you (or somebody else) is going to want to call a variable node and cause a conflict. I've called it Node, capitalized to avoid confusion with variables and builtins.

typedef struct Node {
    int value;      
    struct Node* next;
    struct Node* previous;      
} Node;

Now we can write Node_new and Node_destroy.

Node *Node_new() {
    Node *node = malloc(sizeof(Node));
    node->value = 0;
    node->next = NULL;
    node->previous = NULL;

    return node;
}

void Node_destroy( Node *node ) {
    free(node);
}

Node_destroy might seem silly, but it frees you (or anyone else) from having to remember how to destroy a Node. And it lets you change the internal structure of Node without changing the rest of the code (which happened while writing this).


You're using globals. Globals make everything more complicated and restrict what you can do with the code. Instead, wrap things like head, tail, and size into its own structure and pass that around.

typedef struct {
    Node *head;
    Node *tail;
    size_t size;
} LinkedList;

And it needs its own create and destroy functions.

LinkedList *LinkedList_new() {
    LinkedList *list = malloc(sizeof(LinkedList));
    list->head = NULL;
    list->tail = NULL;
    list->size = 0;

    return list;
}

void LinkedList_destroy( LinkedList *list ) {
    for( Node *node = list->head; node != NULL; node = node->next ) {
        Node_destroy(list->head);
    }

    free(list);
}

Note that LinkedList_destroy takes responsibility for cleaning up all its nodes, its one less thing for the user of LinkedList to worry about and potentially screw up.

LinkedList_destroy can call Node_destroy without knowing anything about how Node works. This is how we immediately benefit from the encapsulation and abstraction of Node. But don't use recursion, the list can be arbitrarily long and recursion risks a stack overflow.


Now we can write push and pop assured that things are properly created and destroyed. Note that they take a LinkedList rather than using globals.

void LinkedList_push(LinkedList *list, int value)
{
    Node *node = Node_new();
    node->value = value;

    switch( list->size ) {
        /* The list is empty, this is the first node */
        case 0:
            list->head = list->tail = node;
            break;
        default:
            list->tail->next = node;
            node->previous = list->tail;
            list->tail = node;
            break;
    }

    list->size++;
}

int LinkedList_pop( LinkedList *list ) {
    Node *popped = list->tail;

    switch( list->size ) {
        /* The list is empty, nothing to pop */
        case 0:
            fprintf(stderr, "LinkedList was empty when popped.\n");
            exit(1);
            break;
        /* Popped the last node */
        case 1:
            list->head = list->tail = NULL;
            break;
        /* Only one node left, it's both the head and tail */
        case 2:
            list->tail = list->head;
            list->tail->previous = list->tail->next = NULL;
            break;
        default: 
            list->tail = popped->previous;
            list->tail->next = NULL;
            break;
    }

    /* Have to do this at the end because size_t is unsigned
       it can't go negative */
    list->size--;

    int value = popped->value;
    Node_destroy(popped);

    return value;
}

I've used a switch so I can clearly demarcate all the special cases.

I'm not saying this is the best implementation of push and pop, or that it's even bug free, but they can be written without worrying about whether the structs have been properly initialized or freed. You can focus on the logic, not the memory management.


And then to demonstrate it all works...

void LinkedList_print( LinkedList *list ) {
    for( Node *node = list->head; node != NULL; node = node->next) {
        printf("%d\n", node->value);
    }
}

int main() {
    LinkedList *list = LinkedList_new();

    for( int i = 0; i < 3; i++ ) {
        LinkedList_push(list, i);
    }

    while( list->size != 0 ) {
        printf("list->size: %zu\n", list->size);
        LinkedList_print(list);
        LinkedList_pop(list);
    }

    LinkedList_destroy(list);
}

$ ./test
list->size: 3
0
1
2
list->size: 2
0
1
list->size: 1
0
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • Thank you so much for taking the time to write this answer, I really appreciate; there's a ton of helpful information for me in there. Just one thing I'm not sure I understand: when you declare the Node struct, you repeat `Node` twice, as in `typedef struct Node {......} Node;`, but when you declare the LinkedList struct, you only say `typedef struct {.......} LinkedList;`. Is there a specific reason for that? What's the difference? – jeremy radcliff Feb 09 '17 at 00:09
  • `typedef `. `struct Node` is a type name. `Node` is the alias. Nodes are self-referencial, but the struct definition happens *before* it's aliased. We can't use `Node` in the struct, it needs a name. `struct Node` is the name. The `LinkedList` struct isn't self-referencial, so there's no need to give it a name before aliasing it. There's no real harm in declaring `struct LinkedList`, but I don't because if people write `struct LinkedList` it breaks the encapsulation of `LinkedList` should there be a radical change to its internals later. Yes, this is practical. – Schwern Feb 09 '17 at 00:31
  • @jeremyradcliff For example, I could completely hide the actual memory of a `LinkedList` with `typedef int LinkedList`. That integer would be an index into where LinkedLists are actually stored. This is, very broadly, what a file descriptor does. By fiercely enforcing encapsulation, I can be sloppy in my implementation (I'm pretty sure I have head and tail backwards), and fix it up and very aggressively improve it later. – Schwern Feb 09 '17 at 00:35