0

For C++11 and above:
ad_tree_node refers to a Union data structure designed to hold an object of type ad_node or vary_node which are of type struct.

The code compiles but memory leaks are detected when used with valgrind.

Following is the code:

struct ad_node {
    std::string tag_;
    int count_;
    std::vector<int> index_;
    bool leaf_;
};

struct vary_node {
    std::string tag_;
    int index_;
    int mcv_;
};

union ad_tree_node {
    ad_node ad;
    vary_node vy;
    ad_tree_node () { std::memset(this, 0, sizeof(ad_node)); }
    ~ad_tree_node() {}
};

std::string print_ad_tree_node(const ad_tree_node& nd, bool is_ad_node) {
    std::string st = "";
    if (is_ad_node) {
        st += "ad_node: " + nd.ad.tag_+ ", " + std::to_string(nd.ad.count_) + ", ";
        if (nd.ad.leaf_) { st += " leaf, "; }
        else { st += "non-leaf, "; }
        st += "index: ";
        std::cout << nd.ad.index_.size();
        for (auto x : nd.ad.index_) { st += std::to_string(x) + ", "; }
    }
    else {
        st += "vy_node: " + nd.vy.tag_ + ", mcv_: " + std::to_string(nd.vy.mcv_) + ", ";
        st += "index: ";
        std::cout << nd.vy.index_;
    }
    return st;
}

int main () {
    ad_node t1;
    for (int i = 0; i < 10; ++i) t1.index_.push_back(i);

    t1.tag_ = "A";
    t1.leaf_ = true;
    t1.count_ = 9;

    ad_tree_node nd;
    nd.ad.tag_ = t1.tag_;
    nd.ad.leaf_ = t1.leaf_;
    nd.ad.count_ = t1.count_;
    nd.ad.index_ = std::move(t1.index_);
    std::cout << print_ad_tree_node(nd, true) << std::endl;

    return 0;
}

Compiled with following flags: g++ -std=c++11 -g3 code.cpp
valgrind --leak-check=full ./a.out

Reported leak by valgrind:

==6625== Memcheck, a memory error detector
==6625== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6625== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==6625== Command: ./a.out
==6625== 
10ad_node: A, 9,  leaf, index: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
==6625== 
==6625== HEAP SUMMARY:
==6625==     in use at exit: 72,770 bytes in 3 blocks
==6625==   total heap usage: 10 allocs, 7 frees, 73,946 bytes allocated
==6625== 
==6625== 2 bytes in 1 blocks are definitely lost in loss record 1 of 3
==6625==    at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6625==    by 0x4F593DE: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==6625==    by 0x4F596E8: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==6625==    by 0x4022EA: main (code.cpp:56)
==6625== 
==6625== 64 bytes in 1 blocks are definitely lost in loss record 2 of 3
==6625==    at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6625==    by 0x40386D: __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) (new_allocator.h:104)
==6625==    by 0x403518: std::allocator_traits<std::allocator<int> >::allocate(std::allocator<int>&, unsigned long) (alloc_traits.h:491)
==6625==    by 0x403257: std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) (stl_vector.h:170)
==6625==    by 0x402D79: void std::vector<int, std::allocator<int> >::_M_emplace_back_aux<int const&>(int const&) (vector.tcc:412)
==6625==    by 0x402B24: std::vector<int, std::allocator<int> >::push_back(int const&) (stl_vector.h:923)
==6625==    by 0x402295: main (code.cpp:49)
==6625== 
==6625== LEAK SUMMARY:
==6625==    definitely lost: 66 bytes in 2 blocks
==6625==    indirectly lost: 0 bytes in 0 blocks
==6625==      possibly lost: 0 bytes in 0 blocks
==6625==    still reachable: 72,704 bytes in 1 blocks
==6625==         suppressed: 0 bytes in 0 blocks
==6625== Reachable blocks (those to which a pointer was found) are not shown.
==6625== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==6625== 
==6625== For counts of detected and suppressed errors, rerun with: -v
==6625== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
letsBeePolite
  • 2,183
  • 1
  • 22
  • 37
  • 2
    This is undefined behavior. To correctly use C++ classes in a union, you must meticulously construct/deconstruct the appropriate union member using placement new. Even better: forget about about unions. Use `std::variant` from C++17, and it'll take care of all these details for you. – Sam Varshavchik Sep 25 '17 at 23:57
  • As indicated in an earlier SO answer https://stackoverflow.com/questions/40106941/is-a-union-members-destructor-called you will be leaking all the vary_node's strings. No implicit destruction of fields, which is normal since those fields might not exist depending on which possible item it is. – Andrew Lazarus Sep 26 '17 at 00:00
  • `memset` doesn't allocate anything, it does however destroy the strings and vectors in you non-pod structures. You can't clear those with memset. What you've done is nulled out any pointers to memory buffers that the string and vector allocated, and pretty much made them useless. – Retired Ninja Sep 26 '17 at 00:00
  • 1
    In other words, you tore your `union` to shreds when you used `memset`. Don't do it. – PaulMcKenzie Sep 26 '17 at 01:13

1 Answers1

4

Your destructor for ad_tree_node is just {}. Unlike normal class types, whose destructors will automatically invoke the destructors for each base class and each member object, for unions you have to do everything yourself. In this case, ~ad_tree_node() does not invoke ~ad_node(), and so all the memory allocated by the std::string and std::vector members is leaked. This is precisely what valgrind is complaining about.

You also have the problem of memset - which doesn't play nicely with non-standard-layout types. memset-ing a string or a vector is UB.

What you really want is a variant:

using ad_tree_node = variant<ad_node, vary_node>;

A variant, in C++ parlance, is like a union - except it knows which type it contains, manages its storage correctly per C++ object semantics, and cleans up after itself. For C++11, a good implementation is Boost.Variant.

Barry
  • 286,269
  • 29
  • 621
  • 977