0

I have a C++ program that creates Huffman codes for all characters in file. It works good, but I want to create nodes without using new operator because I know that you shouldn't use it. I tried using a vector global variable for saving nodes but that doesn't work.

std::vector<Node> nodes;

Node* create_node(unsigned char value, unsigned long long counter, Node* left, Node* right) {

    Node temp;
    temp.m_value = value;
    temp.m_counter = counter;
    temp.m_left = left;
    temp.m_right = right;

    nodes.push_back(temp);
    return &nodes[nodes.size() - 1];
}

Edit: I added more code, I did't really explained what doesn't work. Problem is in generate_code(), it never reaches nullptr. I also tried using Node and not Node* but the same thing happened.

void generate_code(Node* current, std::string code, std::map<unsigned char, std::string>& char_codes) {

    if (current == nullptr) {
        return;
    }

    if (!current->m_left && !current->m_right) {
        char_codes[current->m_value] = code;
    }

    generate_code(current->m_left, code + "0", char_codes);
    generate_code(current->m_right, code + "1", char_codes);
}


void huffman(std::ifstream& file) {

    std::unordered_map<unsigned char, ull> char_frequency;
    load_data(file, char_frequency);

    std::priority_queue<Node*, std::vector<Node*>, Comparator> queue;

    for (auto& node : char_frequency) {
        queue.push(create_node(node.first, node.second, nullptr, nullptr));
    }

    while (queue.size() != 1) {

        Node* left = queue.top();
        queue.pop();
        Node* right = queue.top();
        queue.pop();

        auto counter = left->m_counter + right->m_counter;
        queue.push(create_node('\0', counter, left, right));
    }
    
    std::map<unsigned char, std::string> char_codes;
    Node* root = queue.top();
    generate_code(root, "", char_codes);

    for (auto& i : char_codes) {
        std::cout << +i.first << ": " << i.second << "\n";
    }
}
  • Can you be more specific about "doesn't work"? – François Andrieux Nov 13 '20 at 20:19
  • 4
    This is a tough thing to get right. Every time you push into `nodes`, you run the risk of a resize and invalidating all of the pointers you've returned. This may be a good time to use [smart pointers.](https://en.wikipedia.org/wiki/Smart_pointer) – user4581301 Nov 13 '20 at 20:20
  • 3
    Using an `std::vector` is not a good solution. It may be difficult to remove the `Node` from `nodes` when you are done with it, and it is easy to forget to do so. This is probably worse than just using `new`. The right solution would be to use value semantics (return `Node` instead of `Node*`) or using `std::unique_ptr`. – François Andrieux Nov 13 '20 at 20:20
  • 2
    My crystal ball tells the addresses in both `left` and `right` are from prior-inserted members in the vector via the results of this function, and are therefore invalidated on resize. How about you clarify with an actual [mcve]. If all nodes are being stored in this sequence, why not just store *index* values rather than pointers or your left/right members? – WhozCraig Nov 13 '20 at 20:23
  • 3
    I'm sick of this trend of not using something just because someone said so. – Hrisip Nov 13 '20 at 20:23
  • 2
    `new` isn't bad by itself. It becomes bad when you do not fully nail down [ownership](https://stackoverflow.com/questions/49024982/what-is-ownership-of-resources-or-pointers) of the object, and even if you do it's surprisingly hard to make sure you hit the right `delete` at the right time. Your central repository of nodes isn't a bad idea, but it won't work with a dynamic number of nodes. [Here's a presentation on one way you can get your structure working with dynamic allocation and keep ownershiip under control](https://www.youtube.com/watch?v=JfmTagWcqoE). – user4581301 Nov 13 '20 at 20:27
  • `without using new operator because I know that you shouldn't use it.` why wouldn't you use it? And why would another solution that uses new in the background be better? (PS: you want std::list to have stable pointers or just std::make_unique) – JVApen Nov 13 '20 at 20:28
  • Though an incredibly remedial example, [you might find this interesting](https://ideone.com/CKQOQk). – WhozCraig Nov 13 '20 at 20:48

2 Answers2

6

The general answer is of course to use smart pointers, like std::shared_ptr<Node>.
That said, using regular pointers is not that bad, especially if you hide all pointers from the outside. I wouldn't agree with "you shouldn't use new", more like "you should realize that you have to make sure not to create a memory leak if you do".

In any case, for something like you do, especially with your vector, you don't need actual pointers at all. Simply store an index for your vector and replace every occurence of Node* by int, somewhat like:

class Node
{
    public:

        // constructors and accessors

    private:

        ValueType value;
        int index_left;
        int index_right;
}

I used a signed integer as index here in order to allow storing -1 for a non-existent reference, similar to a null pointer.
Note that this only works if nothing gets erased from the vector, at least not before everything is destroyed. If flexibility is the key, you need pointers of some sort.

Also note that you should not have a vector as a global variable. Instead, have a wrapping class, of which Node is an inner class, somewhat like this:

class Tree
{
    public:

        class Node
        {
        ...
        };

        // some methods here

    private:
        
        vector<Node> nodes;
}

With such an approach, you can encapsulate your Node class better. Tree should most likely be a friend. Each Node would store a reference to the Tree it belongs to.

Another possibility would be to make the vector a static member for Node, but I would advise against that. If the vector is a static member of Node or a global object, in both cases, you have all trees you create being in one big container, which means you can't free your memory from one of them when you don't need it anymore.
While this would technically not be a memory leak, in practice, it could easily work as one.
On the other hand, if it is stored as a member of a Tree object, the memory is automatically freed as soon as that object is removed.

Aziuth
  • 3,652
  • 3
  • 18
  • 36
2

but I want to create nodes without using new operator because I know that you shouldn't use it.

The reason it is discouraged to use new directly is that the semantics of ownership (i.e. who is responsible for the corresponding delete) isn't clear.

The c++ standard library provides the Dynamic memory management utilities for this, the smart pointers in particular.

So I think your create function should look like follows:

std::unique_ptr<Node> create_node(unsigned char value, unsigned long long counter, Node* left, Node* right) {

    std::unique_ptr<Node> temp = std::make_unique<Node>();
    temp->m_value = value;
    temp->m_counter = counter;
    temp->m_left = left;
    temp->m_right = right;

    return temp;
}

This way it's clear that the caller takes ownership of the newly created Node instance.

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190