1

I am new to C++ and I am trying to implement Huffman Codes in it. To do so I had to implement a tree-like structure in it which required me to implement pointers in a structure. But the problem that I am facing is that every time I allocate a new value to a variable it is getting the same address and the tree that I am trying to implement is becoming self-referential at the topmost element which is preventing me from traversing it.

Here is my code -

#include<iostream>
#include<string.h>
#include<vector>
using namespace std;

struct node{
    int freq;
    string symbol;
    node *left = nullptr, *right=nullptr;
    string huff;
};

vector<node> nodes;

vector<node> nodeSort(vector<node> arr, int s){
    for(int x=0;x<s;x++){
        for(int y=x+1;y<s;y++){
            if(arr[x].freq>arr[y].freq){
                node temp = arr[x];
                arr[x] = arr[y];
                arr[y] = temp;
            }
        }
    }
    return arr;
}

void printHuff(node Node, string val=""){
    string newval = val+Node.huff;
    if(Node.left!=nullptr){
        printHuff(*Node.left,newval);
    }
    if(Node.right!=nullptr){
        cout<<2;
        printHuff(*Node.right,newval);
    }
    if(Node.left==nullptr and Node.right==nullptr){
        cout<<Node.symbol<<" -> "<<newval;
    }
}

int main(){
    char symbols[] = {'A','B','C','D'};
    int freq[] = {5,1,6,3};
    for(int x=0;x<4;x++){
        node temp;
        temp.freq = freq[x];
        temp.symbol = symbols[x];
        temp.huff = "";
        nodes.push_back(temp);
    }

    while(nodes.size()>1){
        nodes = nodeSort(nodes,nodes.size());
        node left = nodes[0];
        node right = nodes[1];
        node temp;
        left.huff += "0";
        right.huff += "1";
        temp.freq = left.freq + right.freq;
        temp.symbol = left.symbol + right.symbol;
        temp.left = &left;
        temp.right = &right;
        nodes.erase(nodes.begin());
        nodes.erase(nodes.begin());
        nodes.push_back(temp);
    }
    node a = nodes[0];
    cout<<a.left->symbol<<endl;
    cout<<a.right->symbol<<endl;
    cout<<a.right->left->symbol<<endl;
    cout<<a.right->right->symbol<<endl;
    // printHuff(nodes[0]);
    return 0;
} 

Here is the expected Huffman Tree -

expected Huffman Tree

Here is the output -

C
BDA
C
BDA

Which is definitely wrong. But I am unable to find out what referencing am I doing wrong? I am new to C++, although I have implemented the same algorithm in Python, I am unable to understand what is going wrong here.

Deekshant
  • 460
  • 7
  • 15
  • 3
    When you add nodes to your tree, you probably need to use `new` (at least indirectly) to create a new node each time. – Jerry Coffin Dec 27 '20 at 07:55
  • 1
    `temp.left = &left; temp.right = &right;` -- The `left` and `right` are temporary variables. Those variables disappear every time that `while` loop iterates. So you are storing the address of temporaries. Undefined behavior. – PaulMcKenzie Dec 27 '20 at 08:06
  • 2
    *although I have implemented the same algorithm in Python* -- C++ is not Python, Java, C#, or any other language that looks like C++. C++ is C++. To code a particular data structure correctly in C++ requires you to know C++, and not use other languages as a guide in writing the code. C++ is value-based, not reference-based, and that is the major thing that trips up programmers who do not know C++ properly, but use other languages as guides in writing C++ code. – PaulMcKenzie Dec 27 '20 at 08:08
  • You may want to read the following link for information on the lifetime of C++ objects: https://en.cppreference.com/w/cpp/language/lifetime – Andreas Wenzel Dec 27 '20 at 08:15
  • Faced with a similar question years ago, I contrived a [super-basic huffman generator](https://coliru.stacked-crooked.com/a/efc9c4cb43181829). The code is fairly descriptive, and although nowhere near optimal, the basics are there. Hopefully it will show how you can do this with smart pointers (in this case, `std::unique_ptr` to achieve what you want without killing yourself on memory management. – WhozCraig Dec 27 '20 at 08:43
  • @PaulMcKenzie by storing temporary variables, do you mean that after every iteration of the `while` loop the variables get destroyed hence the pointer does not point to the newly created data but instead, to a deleted variable that has no data? If so then how should I manage this type of temporary variables? If not then please elaborate on what do you mean by Undefined behavior. – Deekshant Dec 27 '20 at 08:49
  • you also twice erase first element of vector, but not doing anything with that, it stays erased. and all pointers are invalidated. if you use vector to store tree of constant size, you have to swap nodes when sorting them. erasing invalidates remained of sequence, you had utterly destroyed your tree by that. – Swift - Friday Pie Dec 27 '20 at 08:49
  • @WhozCraig thank you for your code, but as I am a complete beginner, it is still difficult for me to understand it. – Deekshant Dec 27 '20 at 08:52
  • @Swift-FridayPie I have to delete those 2 elements as a new element has been created containing both their properties which has to be compared with other nodes now and the 2 nodes that are being deleted are out of consideration from further comparisons. – Deekshant Dec 27 '20 at 08:55
  • https://en.cppreference.com/w/cpp/container/vector/erase `Invalidates iterators and references at or after the point of the erase, including the end() iterator.` In layman term that means that all nodes were moved, saved pointers no longer point to their prior locations. In general vector is not consistent with tree implementation, you'd like to use deque. Or a vector of pointers and then you have `new` them, but that's overkill at this point you can just crate your own container. – Swift - Friday Pie Dec 27 '20 at 08:58
  • @john that implies he needs to stop using `vector` and use `vector` (better not, better smart pointer). `vector` stores copies. As well the sorting algorithm should use reference , not local copies. – Swift - Friday Pie Dec 27 '20 at 09:05
  • @Swift-FridayPie thank you for that explanation I understood your point. Could you also be kind enough to point me towards some references of said methods of the correct of doing it? – Deekshant Dec 27 '20 at 10:34
  • 1
    @Deekshant -- Notre that I mentioned you are storing the *address* of the temporary variable. Since the variable is temporary, what does that address point to when the variable goes out of scope? – PaulMcKenzie Dec 27 '20 at 11:12
  • @PaulMcKenzie I guess it points to nothing? – Deekshant Dec 28 '20 at 15:34
  • No, it doesn't point to nothing. It points to somewhere that is no longer valid to access. I stated this previously -- unlike other computer languages, C++ allows you to shoot yourself in the foot. – PaulMcKenzie Dec 28 '20 at 19:46

1 Answers1

2

Problem is in node.erase() it is deleting the nodes which you are pointing through temp.left and temp.right. A slightly better way is to form vector of node* then form nodes through new operator and store the pointer in the vector. This way when you do node.erase() it will only erase the pointer and not the node itself. Code below:

#include<iostream>
#include<string.h>
#include<vector>
using namespace std;

struct node{
    int freq;
    string symbol;
    node *left = nullptr, *right=nullptr;
    string huff;
};

vector<node *> nodes;

vector<node*> nodeSort(vector<node*> arr, int s){
    for(int x=0;x<s;x++){
        for(int y=x+1;y<s;y++){
            if(arr[x]->freq>arr[y]->freq){
                node *temp = arr[x];
                arr[x] = arr[y];
                arr[y] = temp;
            }
        }
    }
    return arr;
}

void printHuff(node *Node, string val=""){
    
    if(Node->left!=NULL){
        printHuff(Node->left,val+"0");
    }
    if(Node->right!=NULL){
        // cout<<2;
        printHuff(Node->right,val+"1");
    }
    if(Node->left==NULL and Node->right==NULL){
        cout<<Node->symbol<<" -> "<<val<<endl;
    }
}

int main(){
    char symbols[] = {'A','B','C','D'};
    int freq[] = {5,1,6,3};
    for(int x=0;x<4;x++){
        node *temp = new node;
        temp->freq = freq[x];
        temp->symbol = symbols[x];
        temp->huff = "";
        nodes.push_back(temp);
    }

    for(auto it: nodes){
        cout<<it->symbol<<" "<<it->freq<<endl;
    }
    while(nodes.size()>1){
        nodes = nodeSort(nodes,nodes.size());
        node *left  = nodes[0];
        node *right = nodes[1];
        node *temp = new node;
        // left.huff += "0";
        // right.huff += "1";
        temp->freq = left->freq + right->freq;
        temp->symbol = left->symbol + right->symbol;
        temp->left = left;
        temp->right = right;
        nodes.erase(nodes.begin());
        nodes.erase(nodes.begin());
        nodes.push_back(temp);
    }
    // node a = nodes[0];
    // cout<<a.left->symbol<<endl;
    // cout<<a.right->symbol<<endl;
    // cout<<a.right->left->symbol<<endl;
    // cout<<a.right->right->symbol<<endl;
    printHuff(nodes[0]);
    return 0;
} 

Also do not use using namespace std. Read this question and welcome to C++.

dark_prince
  • 237
  • 2
  • 12
  • The answer does seem to print the correct output but how does using the `new` operator affect the outcome instead of what I was doing? – Deekshant Dec 28 '20 at 15:44
  • actually, `new` operator is creating a node and returning the pointer to that node so when you are doing `nodes.erase()` it is just erasing the pointer and not the node that you have created using the `new` operator, but in your code you are taking vector of nodes and by doing `nodes.erase()` you are deleting the node itself. – dark_prince Dec 30 '20 at 04:36