-4

Evening all working on a C++ implementation of huffman coding, Here s what I have. Which oddly is working most of the time. But with certain input I am getting incorrect coding/output. Please let me know if you see what im missing here...Ill keep looking.. Thanks for your help ! Updated code to fix problems mentioned below...still getting the same problem.

#include <iostream> 
#include <queue> 
#include <list> 
#include <iterator>
#include <vector>
#include <algorithm>

class HuffmanCodes
{
struct Node
{
int data;
size_t freq;
Node* left;
Node* right;

Node()
{
   data = '\0';
   freq = 0;
}
Node(int data, size_t freq) : data(data),
                                freq(freq),
                                left(nullptr),
                                right(nullptr)
                                {}
~Node()
{
  delete left;
  delete right;
}
};

struct compare
{
 bool operator()(Node* l, Node* r)
{
   return (l->freq > r->freq);
}
};

Node* top;

void printCode(Node* root, std::string str, std::vector<int>& data, int i)
{
if(root == nullptr)
 return;

if(root->data != '$' && data[i] == root->data )
{
 std::cout << root->data +1 << " : " << str << "\n";
}
printCode(root->left, str + "0" ,data, i);
printCode(root->right, str + "1", data, i);
}

public:
  HuffmanCodes() {}
   ~HuffmanCodes()
 {
   delete top;
 }
 void GenerateCode( std::vector<int>& data, std::vector<size_t>& freq)
 {
  Node* left;
  Node* right;

  std::priority_queue<Node*, std::vector<Node*>, compare > minHeap;

  for(size_t i = 0; i < data.size(); ++i)
  {
     minHeap.push(new Node(data[i], freq[i]));
  }

   while(minHeap.size() != 1)
   {
     std::sort (data.begin(), data.end());
     left = minHeap.top();
     minHeap.pop();

     right = minHeap.top();
     minHeap.pop();

     top = new Node('$', left->freq + right->freq);
     top->left  = left;
     top->right = right;
     minHeap.push(top);
    }
    for( int j = 0; j < data.size(); j++ )
        printCode(minHeap.top(), "", data, j);
 }
};

int main()
{
   int n;
   std::cin >> n;
   std::vector<int> data;
   std::vector<size_t> freq;


   for(int i = 0; i < n; i++){
        int input;
        std::cin >> input;
        freq.push_back(input);
   }

   for(int i = 0; i < n; i++){

       data.push_back(i);
   }
  HuffmanCodes set1;

  size_t size = n;
  set1.GenerateCode(data, freq);

  return 0;
 }

Input: 20 84 87 78 16 94 36 87 93 50 22 63 28 91 60 64 27 41 27 73 37

Output:
1010 1100
1001 100010 000 01101 1011
1111 11101 100011 0100 01100
1101 0011 0101
00100 11100 00101 0111 10000

Correct output: 1010

1011 1001 100010 000 01011 1100
1111 11101 100011 0100 01010 1101 0011 0110 00100 11100 00101 0111 10000

  • `int p[n];int a[n];` -- This is not valid C++. Arrays in C++ must use a constant expression to denote the number of entries, not a runtime variable. Use `std::vector p(n), a(n);` instead. Second, `for(int i = 0; i <= n; i++)` { `std::cin >> p[i];` -- See anything wrong in that loop, like an out-of-bounds access? – PaulMcKenzie Nov 19 '18 at 03:22
  • 1
    *Which oddly is working most of the time.* -- It was never working. You have out-of-bounds accesses. The problem is that you will never know it was broken by using non-standard syntax. If you had used `std::vector`, you could have used `.at()` instead of `[ ]` to access the elements, and then you would see that your program would have failed every single time. – PaulMcKenzie Nov 19 '18 at 03:25
  • That was a typo and is not the problem.for(int i = 0; i <=n; i++) { std::cin >> p[i]; – Johnjames123 Nov 19 '18 at 04:46
  • 2
    So I guess you didn't see the obvious error in your loop. You are invoking undefined behavior by writing out-of-bounds. – PaulMcKenzie Nov 19 '18 at 04:48
  • If your referring to the equals it was a typo as i said and is not the problem, its not in the code im using and has been corrected on site. – Johnjames123 Nov 19 '18 at 04:49
  • Are you referring to something else? – Johnjames123 Nov 19 '18 at 04:49
  • 1
    Then please post your actual code, not code filled with typos -- take the code you are running and *copy and paste it* into the edit window -- don't type it in. And make the corrections in the original post, not the comment section. – PaulMcKenzie Nov 19 '18 at 04:50
  • As i said I already corrected it. I am human. – Johnjames123 Nov 19 '18 at 04:52
  • [Don't do this](https://stackoverflow.com/questions/31816095/why-should-i-not-include-bits-stdc-h). – PaulMcKenzie Nov 19 '18 at 04:53
  • `left[i] = INT_MAX;` That is in your `merge` function. You need to check your boundary conditions, as you are probably going out of bounds. I will claim that your program was never working, but undefined behavior (which can be anything) led you to believe your program was working. – PaulMcKenzie Nov 19 '18 at 05:34

2 Answers2

1

Valgrind points immediately to the incorrect code:

g++ -std=c++2a -fPIC -g -Wall -Wextra -Wwrite-strings -Wno-parentheses -Wpedantic -Warray-bounds  -Weffc++       53367469.cpp    -o 53367469
53367469.cpp: In constructor ‘MinHeapNode::MinHeapNode(int, unsigned int)’:
53367469.cpp:20:1: warning: ‘MinHeapNode::data’ should be initialized in the member initialization list [-Weffc++]
 MinHeapNode(int data, unsigned freq) {
 ^~~~~~~~~~~
53367469.cpp:20:1: warning: ‘MinHeapNode::freq’ should be initialized in the member initialization list [-Weffc++]
53367469.cpp:20:1: warning: ‘MinHeapNode::left’ should be initialized in the member initialization list [-Weffc++]
53367469.cpp:20:1: warning: ‘MinHeapNode::right’ should be initialized in the member initialization list [-Weffc++]
53367469.cpp: In function ‘int main()’:
53367469.cpp:104:8: warning: ISO C++ forbids variable length array ‘p’ [-Wvla]
 int p[n];
        ^
53367469.cpp:105:8: warning: ISO C++ forbids variable length array ‘a’ [-Wvla]
 int a[n];
        ^
53367469.cpp: In function ‘void merge(int*, int, int, int)’:
53367469.cpp:145:14: warning: ISO C++ forbids variable length array ‘left’ [-Wvla]
   int left[n1];
              ^
53367469.cpp:146:15: warning: ISO C++ forbids variable length array ‘right’ [-Wvla]
   int right[n2];
               ^
valgrind -q --leak-check=full ./53367469  <<<"$INPUT" 
==8553== Invalid read of size 8
==8553==    at 0x10A41D: HuffmanCodes(int*, int*, int) (53367469.cpp:72)
==8553==    by 0x10A704: main (53367469.cpp:115)
==8553==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==8553== 
==8553== 
==8553== Process terminating with default action of signal 11 (SIGSEGV)
==8553==  Access not within mapped region at address 0x0
==8553==    at 0x10A41D: HuffmanCodes(int*, int*, int) (53367469.cpp:72)
==8553==    by 0x10A704: main (53367469.cpp:115)
==8553==  If you believe this happened as a result of a stack
==8553==  overflow in your program's main thread (unlikely but
==8553==  possible), you can try to increase the size of the
==8553==  main thread stack using the --main-stacksize= flag.
==8553==  The main thread stack size used in this run was 8720384.

An empty queue has no valid top().


As an aside, using <bits/stdc++.h> is inefficient and non-portable; using namespace std is also unwise.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
  • I believe i fixed these things and the problems still remains.... – Johnjames123 Nov 20 '18 at 01:04
  • And did you fix the subsequent problems? You can't assume that you've finished merely because you've fixed the first bug. Use Valgrind again, and continue debugging until you can't make further progress; then ask for help. – Toby Speight Nov 20 '18 at 08:10
0

Turns out the issue was in how the code handles duplicates, if anyone is curious. Code was working just had to make some changes to the comparison in the struct to get the proper output. Thanks all for taking the time to look at and or respond to my post I appreciate the advice as always, but they were unrelated to the problem. I had to track this one down by doing the huff tree by hand. But solved now, ill post the code in a bit in case anyone is interested.