-3

I want to programmatically convert a string of characters stored in a file to a string of character codes (encode) by following a code table. The string of binary codes should then go to a file, from which I can revert it back to the string of characters later (decode). The codes in the code table were generated using Huffman algorithm and the code table is stored in a file.

For example, by following a code table where characters and its corresponding codes are single spaced like this:

E 110
H 001
L 11
O 111

encoding "HELLO" should output as "0011101111111"

My C++ code cannot seem to complete the encoded string. Here is my code:

int main
{
    string English;
    ifstream infile("English.txt");
    if (!infile.is_open())  
    {    
        cout << "Cannot open file.\n";    
        exit(1); 
    }

    while (!infile.eof())
    {
      getline (infile,English);
    }
    infile.close();
    cout<<endl;
    cout<<"This is the text in the file:"<<endl<<endl;
    cout<<English<<endl<<endl;

    ofstream codefile("codefile.txt");
    ofstream outfile ("compressed.txt");
    ifstream codefile_input("codefile.txt");
    char ch;
    string st;

    for (int i=0; i<English.length();)
    {
        while(!codefile_input.eof())
        {
            codefile_input >> ch >> st;
            if (English[i] == ch)
            {
                outfile<<st;
                cout<<st;
                i++;
            }
        }
    }
    return 0;
}

For an input string of "The_Quick_brown_fox_jumps_over_the_lazy_dog", the output string is 011100110, but it should be longer than that!

output image

Please help! Is there anything I have missed? (n.b. my C++ code has no syntax errors)

Rabbani Rasha
  • 216
  • 3
  • 6
  • Did you try stepping through your code, in a debugger? – Algirdas Preidžius Nov 14 '16 at 14:01
  • 1
    What do you think is going to happen after you find the first character's encoded value in `codefile.txt`, write it out, and now you have to find the second character's encoded value? Your `codefile_input` is still in the middle of the file, somewhere, it's not going to magically go back to the beginning of the file, to search the second character's encoded value, all by itself. – Sam Varshavchik Nov 14 '16 at 14:03
  • +Sam so how can I make codefile_input go back to the beginning of file? – Rabbani Rasha Nov 14 '16 at 14:12
  • @RabbaniRasha You don't need to. There is more elegant solution. Look at my answer for details. – Algirdas Preidžius Nov 14 '16 at 14:35
  • If you want to be able to **decode** the file, you'd better choose another encoding. If `L` is `11` and `O` is `111`, then what is `111111` ? You'll need something called a "Prefix Code". – MSalters Nov 14 '16 at 14:43

1 Answers1

1

Let's take a look at the main loop, you are doing your work in:

for (int i=0; i<English.length();)
{
    while(!codefile_input.eof())
    {
        codefile_input >> ch >> st;
        if (English[i] == ch)
        {
            outfile<<st;
            cout<<st;
            i++;
        }
    }
}

Your code, will read through the codefile_input once, and then will get stuck in codefile_input.eof () == true condition, and then, for (int i=0; i<English.length();) will become an infinite loop, due to the fact, that there won't be a code path, in which i is increased, and it will never reach the value equal to English.length ().

As a side note, take a read on Why is iostream::eof inside a loop condition considered wrong?.

To avoid the issue, explained above, consider reading the dictionary file, to a data container (e.g. std::map), and then, use that, while iterating through the string, that you want to encode.

For example:

std::ifstream codefile_input("codefile.txt");
char ch;
std::string str;
std::map<char, std::string> codes;
while (codefile_input >> ch >> str)
    {
    codes[ch] = str;
    }

codefile_input.close ();

for (int i=0; i<English.length(); ++i)
    {
    auto it = codes.find (English[i]);
    if (codes.end () != it)
        {
        outfile << codes->second;
        cout << codes->second;
        }
    }

Note, you will need to #include <map> to use std::map.


In addition to solving the issue, about which, your question, was actually, about, your loop:

while (!infile.eof())
{
  getline (infile,English);
}

only reads the last line of the file, while discarding all other lines, that came prior to it. If you want to process all the lines in a file, consider changing that loop to:

while (std::getline (infile, English))
    {
    /* Line processing goes here */
    }

And, since, your dictionary is unlikely to be different for different lines, you can move that logic, to the front of this loop:

std::ifstream codefile_input("codefile.txt");
char ch;
std::string str;
std::map<char, std::string> codes;
while (codefile_input >> ch >> str)
    {
    codes[ch] = str;
    }

codefile_input.close ();

ifstream infile("English.txt");
if (!infile.is_open())  
    {    
    cout << "Cannot open file.\n";    
    exit(1); 
    }

ofstream outfile ("compressed.txt");
string English;
while (std::getline (infile, English))
    {
    for (int i=0; i<English.length(); ++i)
        {
        auto it = codes.find (English[i]);
        if (codes.end () != it)
            {
            outfile << codes->second;
            cout << codes->second;
            }
        }
    }

In addition, consider adding error checking for all of the files that you open. You check if you can open file English.txt, and exit if you can't, but you don't check if you could open any other file.


On unrelated note #2, considering reading Why is “using namespace std” considered bad practice? (that's why you see me using std:: explicitly in the code, that I added).

Community
  • 1
  • 1
Algirdas Preidžius
  • 1,769
  • 3
  • 14
  • 17