0

My code can't handle large data, counting letters from my redirected input. End result is hanging and giving enormous numbers. The size of file shouldn't matter as it reads the character, counts it, and moves on. So I'm stumped. count[26] holds the number of each letter, where do I have control over the limit of these numbers?

int main (int argc, char *argv[])
{

int count [26] = { };   


char c;

c = cin.get();     
while(!cin.eof())   
{

  if (isalpha(c))
  {
     c = tolower(c);

     count [ c - 'a']++;

  }             

  c = cin.get();
 }      

} //end main    
harman2012
  • 103
  • 3
  • 8
  • 3
    Use `while (cin.get(c))`, not `while (!cin.eof())`. – chris Dec 12 '12 at 06:28
  • Not an answer, but still: What is the purpose of the `alpha` array? – jogojapan Dec 12 '12 at 06:32
  • @jogojapan Just easier to print out the characters in the end, not necessary I guess. – harman2012 Dec 12 '12 at 06:34
  • It could certainly be replaced with `(char)('a'+i)` in the for-loop and removed from while-loop altogether, yes. On a separate note, I'd be careful with `isalpha`. Depending on the locale, this may return `true` for values much greater than `'a'+26`, hence causing out-of-bound errors with the array. – jogojapan Dec 12 '12 at 06:37
  • Be aware that you assume ASCII is in use when it doesn't have to be. – chris Dec 12 '12 at 06:40
  • What exactly are the symptoms you observe? You mention "hanging", but you also seem to be getting results ("enormous numbers"). So it hangs only for a while? Could this simply mean that takes relatively long, possibly simply because the input is very large? – jogojapan Dec 12 '12 at 06:42
  • @chris The characters read are in integer form (in C as opposed to C++). Is that not correct? – harman2012 Dec 12 '12 at 06:52
  • 1
    I think a [`std::map`](http://en.cppreference.com/w/cpp/container/map) would be better in this case, e.g `std::map counting;` – Some programmer dude Dec 12 '12 at 06:54
  • @jogojapan Yes, it takes awhile. The large erroneous count output isn't right. If I have 40 of 'a', it will output something like, 1000. – harman2012 Dec 12 '12 at 06:55
  • @harman2012, But you use constructs such as `c - 'a'`. The letters of the alphabet are not guaranteed to have consecutive character codes. One such example is EBCDIC. Numbers, however, have that guarantee, so `'3' - '0'` will always be 3. – chris Dec 12 '12 at 06:57
  • @chris Well let's take your example, it will convert that to ebcdic. The lower case letters in the ascii table are consecutive. So for 'e', the difference is 4. So in my `count[4]=1` for the letter e. And then it'll just process the rest of my characters regardless of letter order. – harman2012 Dec 12 '12 at 07:03
  • And what happens when something else comes where `'z' - 'a'` is 40? Oops, out of bounds. A map would solve that problem quite nicely, but other than that, you have to make the array big enough if you want to account for that. – chris Dec 12 '12 at 07:06
  • I am not convinced that you do not actually run into locale-related problems with `isalpha`. Could you replace the alpha-check with `if ((c>='a') && (c<'a'+26))` and see if that helps? – jogojapan Dec 12 '12 at 07:09
  • @chris Wait, can you explain how that would go out of bounds? My count[26] was for letters a-z. What's holding the value for the actual counts? I don't want to use buffers because that will limit my capabilities for large data. – harman2012 Dec 12 '12 at 07:13
  • @harman2012, I doubt it's a problem in your case, but taking EBCDIC, `'z'` has the character code 169, and `'a'` has the character code 129 because there's extra stuff thrown in the middle. – chris Dec 12 '12 at 07:18
  • @jogojapan When I checked alpha with if `((c>='a') && (c<'a'+26))`, it gave the same large numbers except it produced zero for two letters. – harman2012 Dec 12 '12 at 07:18
  • [I tried it](http://ideone.com/Gm4pce), without problems. Can you debug it ? Can you try to change `count` from `int[]` to `char[]`, to see if that can of problem still happens (obviously, just put like 3-20 occurences of each letters) ? – Synxis Dec 12 '12 at 09:05
  • @JoachimPileborg Okay, I'm attempting to use a map because that might help with the larger input data? – harman2012 Dec 14 '12 at 06:51

0 Answers0