0

I am trying to read in a file from the command line, read the characters into an array, count the characters individuality and print the results. the code compiles without any errors but the individual letter counts are much higher than they should be for large files and it does not count at all sometimes for small files.

#include <iostream>
#include <fstream>
#include <string.h> 
#include <cctype> 
#include <algorithm>

using namespace std;


int main(int argc, char **argv){ 
if(argc !=2){
     cout << "usage: " << argv[0] << " <filename>\n";
}
else{
     ifstream myfile (argv[1]); 
     if(!myfile.is_open())  //check to see if file is opened.
        cout << "Can not open your file\n";
    else{
        static int array[26];  //stores the frequency of letters.
        char t;       
    while(!myfile.eof()){     
        while(myfile.get(t)){
            if(isupper(t)){    
                int i = 0;
                do{
                    array[tolower(t)-'a']++;
                    i++;
                }
            while(i < 26);
            }
        }
    int x = 0;
    while(x < 26){
        cout << 'a' + x << ":" << array[x] << endl;
        x++;
    }
    }
    }
    }
return 0;
}
pb2q
  • 58,613
  • 19
  • 146
  • 147
  • 2
    You need to initialise `array` - also it doesn't need to be static. And please fix your code formatting - it's painful to look at. – Paul R Oct 11 '12 at 03:29
  • 1
    You seem to be counting only upper case characters, but each character 26 times..? – jogojapan Oct 11 '12 at 03:30
  • looks really complex, take a look at [this](http://stackoverflow.com/a/4761779/942596) it shows how to move a file into a vector. I would use that and place each character into a std::multiset. – andre Oct 11 '12 at 03:46

2 Answers2

1

The problem is myfile.get(t) which extracts a character from the stream and gets it in t. Now if the read character happens to be upper-case you are iterating over the array 26 times incrementing its lower-case letter count. You need to do it just once.

Also you need to handle non-alpha characters in the input stream.

while(!myfile.eof()){     
    myfile.get(t);
    if(isalpha(t) { // process only alphabets.
        if(isupper(t)) // convert upper case to lower case
            t = tolower(t);
        array[t-'a']++; // increment the count
    }
}
Paul R
  • 208,748
  • 37
  • 389
  • 560
codaddict
  • 445,704
  • 82
  • 492
  • 529
  • 1
    In fact, if the idea is to do a case-insensitive count, you don't even need to do the `if (isupper(t))`, just call `tolower(t)` all the time. Paul R's comment on the question about initializing `array` with zeros still applies. – Brian L Oct 11 '12 at 03:42
1

This doesn't accept a file name on the command line (just processes its standard input) but might provide some inspiration for a general approach that could be somewhat simpler:

#include <ctype.h>
#include <iostream>
#include <vector>
#include <fstream>
#include <iterator>
#include <algorithm>

int main() {
    std::vector<size_t> freqs(26);

    std::for_each(std::istream_iterator<char>(std::cin), 
        std::istream_iterator<char>(),
        [&](char ch) { if(isalpha(ch)) ++freqs[tolower(ch)-'a']; });

    for (int i=0; i<26; i++)
        std::cout << (char)('a'+i) << ":" << freqs[i] << "\n";

    return 0;
}
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111