2

I'm trying to count the number of times each letter appears in a file. When I run the code below it counts "Z" twice. Can anyone explain why?

The test data is:

abcdefghijklmnopqrstuvwxyz

ABCDEFGHIJKLMNOPQRSTUVWXYZ

#include <iostream>                 //Required if your program does any I/O
#include <iomanip>                  //Required for output formatting
#include <fstream>                  //Required for file I/O
#include <string>                   //Required if your program uses C++ strings
#include <cmath>                    //Required for complex math functions
#include <cctype>                   //Required for letter case conversion

using namespace std;                //Required for ANSI C++ 1998 standard.

int main ()
{
string reply;
string inputFileName;
ifstream inputFile;
char character;
int letterCount[127] = {};

cout << "Input file name: ";
getline(cin, inputFileName);

// Open the input file.
inputFile.open(inputFileName.c_str());      // Need .c_str() to convert a C++ string to a C-style string
// Check the file opened successfully.
if ( ! inputFile.is_open())
{
    cout << "Unable to open input file." << endl;
    cout << "Press enter to continue...";
    getline(cin, reply);
    exit(1);
}

while ( inputFile.peek() != EOF )
{
      inputFile >> character;
      //toupper(character);

      letterCount[static_cast<int>(character)]++;
}

for (int iteration = 0; iteration <= 127; iteration++)
{
    if ( letterCount[iteration] > 0 )
    {
         cout << static_cast<char>(iteration) << " " << letterCount[iteration] << endl;
    }
}

system("pause");
exit(0);
}
cyotee doge
  • 1,128
  • 4
  • 15
  • 33
  • 1
    Smells like homework... not sure about C but in Java I use the ASCII equivalent to count occurrences of a letter... – Myles Gray Mar 05 '11 at 20:08
  • 2
    ABCDEFGHIJKLMNOPQRSTUVQXYZ contains two "Q"s, that's the explanation for counting "Q" two times. – Tamer Shlash Mar 05 '11 at 20:09
  • 1
    Have you printed out each character as you read it in? (In your `while ( inputFile.peek() != EOF)` loop). What have you tried for debugging? – Chris Pfohl Mar 05 '11 at 20:09
  • As an aside, you want to #include for system() and exit() – fizzer Mar 05 '11 at 20:43
  • Yes, this is part of a homework assignment, but it's not the whole assignment, just one part I was having problems with. Now I can work on the rest of the assignment myself. – cyotee doge Mar 06 '11 at 01:25

5 Answers5

4

As others have pointed out, you have two Qs in the input. The reason you have two Zs is that the last

inputFile >> character;

(probably when there's just a newline character left in the stream, hence not EOF) fails to convert anything, leaving a 'Z' in the global 'character' from the previous iteration. Try inspecting inputFile.fail() afterwards to see this:

while (inputFile.peek() != EOF)
{
    inputFile >> character;

    if (!inputFile.fail())
    {
        letterCount[static_cast<int>(character)]++;
    }
}

The idiomatic way to write the loop, and which also fixes your 'Z' problem, is:

while (inputFile >> character)
{
      letterCount[static_cast<int>(character)]++;
}
fizzer
  • 13,551
  • 9
  • 39
  • 61
2

There are two Q's in your uppercase string. I believe the reason you get two counts for Z is that you should check for EOF after reading the character, not before, but I am not sure about that.

Jeremiah Willcock
  • 30,161
  • 7
  • 76
  • 78
  • Thank you for catching my oversight, I've corrected my test data and am now getting only "Z" counted twice. I tested you EOF reccomendation by changing to a do loop and I get the same error. – cyotee doge Mar 05 '11 at 20:12
2

Well, others already have pointed out the error in your code.

But here is one elegant way you can read the file and count the letters in it:

 struct letter_only: std::ctype<char> 
 {
    letter_only(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table()
    {
       static std::vector<std::ctype_base::mask> 
             rc(std::ctype<char>::table_size,std::ctype_base::space);

       std::fill(&rc['A'], &rc['z'+1], std::ctype_base::alpha);
       return &rc[0];
    }
 };

struct Counter
{
    std::map<char, int> letterCount;
    void operator()(char  item) 
    { 
       if ( item != std::ctype_base::space)
         ++letterCount[tolower(item)]; //remove tolower if you want case-sensitive solution!
    }
    operator std::map<char, int>() { return letterCount ; }
};

int main()
{
     ifstream input;
     input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only leters only!
     input.open("filename.txt");
     istream_iterator<char> start(input);
     istream_iterator<char> end;
     std::map<char, int> letterCount = std::for_each(start, end, Counter());
     for (std::map<char, int>::iterator it = letterCount.begin(); it != letterCount.end(); ++it)
     {
          cout << it->first <<" : "<< it->second << endl;
     }
 }

This is modified (untested) version of this solution:

Elegant ways to count the frequency of words in a file

Community
  • 1
  • 1
Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • 1
    Wow, very educational result. A bit above my ability right now, but I'll work on understanding this once I'm done with the assignment. – cyotee doge Mar 06 '11 at 01:24
1

For one thing, you do have two Q's in the input.

Regarding Z, @Jeremiah is probably right in that it is doubly counted due to it being the last character, and your code not detecting EOF properly. This can be easily verified by e.g. changing the order of input characters.

As a side note, here

for (int iteration = 0; iteration <= 127; iteration++)

your index goes out of bounds; either the loop condition should be iteration < 127, or your array declared as int letterCount[128].

Péter Török
  • 114,404
  • 31
  • 268
  • 329
1

Given that you apparently only want to count English letters, it seems like you should be able to simplify your code considerably:

int main(int argc, char **argv) { 
   std::ifstream infile(argv[1]);

    char ch;
    static int counts[26];

    while (infile >> ch)
       if (isalpha(ch))
           ++counts[tolower(ch)-'a'];

    for (int i=0; i<26; i++)
        std::cout << 'A'  + i << ": " << counts[i] <<"\n";
    return 0;
}

Of course, there are quite a few more possibilities. Compared to @Nawaz's code (for example), this is obviously quite a bit shorter and simpler -- but it's also more limited (e.g., as it stands, it only works with un-accented English characters). It's pretty much restricted to the basic ASCII letters -- EBCDIC encoding, ISO 8859-x, or Unicode will break it completely.

His also makes it easy to apply the "letters only" filtration to any file. Choosing between them depends on whether you want/need/can use that flexibility or not. If you only care about the letters mentioned in the question, and only on typical machines that use some superset of ASCII, this code will handle the job more easily -- but if you need more than that, it's not suitable at all.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111