1

I had a general question earlier and got great responses, but now (even though it's the same exercise) I have a more specific question and thought it deserved it's own Q&A page.

Here's my program (not completed, but it compiles and runs ok) :

#include <stdio.h>

#define IN  1 // inside a word
#define OUT 0 // outside a word

// #define MAXWORDLENGTH 10

// print a histogram of the length of words in input. horizontal bar version

int main(void)
{
  int c, state, length;
  int one, two, three, more;

  length = 0;
  one = two = three = more = 0;

  state = OUT;
  while ((c = getchar()) != EOF) {
    if (c == ' ' || c == '\t' || c == '\n')
      state = OUT;
    else state = IN;
    if (state == IN) {
      ++length;
      while ((c = getchar()) != EOF && state != OUT) { // get next character in word
        if (c != ' ' && c != '\t' && c != '\n') // if character is not whitespace...
          ++length; // ...add one to length
        else state = OUT; // otherwise word is over
      }
    if (length != 0) {
      if (length == 1)
        ++one;
      if (length == 2)
        ++two;
      if (length == 3)
        ++three;
      if (length > 3)
        ++more;
    }
    length = 0;
    }
  }

  printf("----------------------------------------------------------------\n");
  // print histogram
  printf("ONE: %d\tTWO: %d\tTHREE: %d\tMORE: %d\n", one, two, three, more); // just making sure program collects data right, which it isn't...
  printf("----------------------------------------------------------------\n");

  return 0;
}

This should be self-explanatory with the comments. I'm using integers for the moment (will switch to an array) and only printing the tallies to make sure it's collecting data right.

Here's the actual program being compiled, run, and tested...

[matt@localhost 1.6]$ cc -Wall e-1-13.c
[matt@localhost 1.6]$ ./a.out 
1 22 333 4444 55555
----------------------------------------------------------------
ONE: 2  TWO: 1  THREE: 1    MORE: 1
----------------------------------------------------------------
[matt@localhost 1.6]$ ./a.out 
1 1 1 1 1 1
----------------------------------------------------------------
ONE: 3  TWO: 0  THREE: 0    MORE: 0
----------------------------------------------------------------
[matt@localhost 1.6]$ ./a.out 
22 22 22 22 22
----------------------------------------------------------------
ONE: 4  TWO: 1  THREE: 0    MORE: 0
----------------------------------------------------------------
[matt@localhost 1.6]$ 

... as you can see it's not. Sometimes the tally is correct, sometimes it's missing some, sometimes it's got too many. Any idea why?

Community
  • 1
  • 1
Matt2012
  • 257
  • 1
  • 2
  • 5
  • 3
    This is debugging. There are two major ways to debug. Either learn to use a debugging program like gdb, or just add a bunch of extra `printf` lines to your source to see where your program is and what data values it has at key points during execution. – aschepler Oct 21 '10 at 21:43
  • 3
    Please, please *always* use braces around the statement inside `if` blocks (specifically, if it's only one statement, because braces are required for more than one). Learn to do this now, so it won't bite you in the ass later. – Greg Hewgill Oct 21 '10 at 21:56
  • I'll try to learn a debugger aschepler. @Greg - I was using braces constantly, but someone got on me about it (saying it wasn't needed) and I thought the constant braces (when not needed) made the source look more cluttered. – Matt2012 Oct 22 '10 at 01:00

2 Answers2

4

I think you need to break out of the loop immediately when you hit the else state = OUT; in the while loop. You're consuming an extra char.

Try changing else state = OUT; to

else {
    state = OUT;
    break;
}

EDIT To elaborate on why this works, let's evaluate this code:

  // 1. while loop conditional
  while ((c = getchar()) != EOF && state != OUT) { // get next character in word
    // 2. conditional succeeded
    if (c != ' ' && c != '\t' && c != '\n') // if character is not whitespace...
      ++length; // ...add one to length
    else state = OUT; // otherwise word is over
  }

The while loop condition (1) is determined by calling getchar and then evaluating state. If this condition succeeds, the code enters into the success block (2). The problem is that you assign state = OUT in the else clause, but require the conditional to be evaluated again, which means you're going to call getchar, and then see that state == OUT. In effect, you skip processing a character you consume.

Short circuiting deals with the order in which a conditional is evaluated. Take a simple example:

int foo = 0;
int bar == 1;

if (foo && bar) { ... }

Since foo is false (0), there's no need to evaluate bar because a false value causes an && to evaluate to false regardless of the rest of the conditions. Similarly if we wrote:

if (bar || foo) {...}

then there's no need to evaluate foo since bar is true, and a true value causes an || to evaluate to true regardless of the rest of the conditionals. The ability to skip evaluating the rest of a condition is called short circuiting.

In your case, if you had swapped the order of the while loop condition to

while (state != OUT && (c = getchar()) != EOF) {

then state != OUT would evaluate to false, which would prevent evaluation of c = getchar() due to short circuiting. Here's a tutorial on it if you want to know more. Hope that helps.

NG.
  • 22,560
  • 5
  • 55
  • 61
  • Yep. Another option is to swap the order of the conditionals on the inner while and take advantage of the short-circuiting. – Dusty Oct 21 '10 at 21:45
  • agreed - exactly what JoshD is proposing in his answer. Might as well break immediately if you know the short circuit will fail anyway. – NG. Oct 21 '10 at 21:48
  • This works great, thank you. I'm not sure I understand it (or why) it works though? I'm sorry I'm new, but also what does short circuiting mean anyway? [Edit] And what do you mean by I'm "consuming an extra char." – Matt2012 Oct 22 '10 at 00:54
3

Change the order of your condition in the while loops:

No:

while(c = getchar()) != EOF && state != OUT)

Yes:

while(state != OUT && c = getchar()) != EOF)
JoshD
  • 12,490
  • 3
  • 42
  • 53
  • I used the above `break` instead, but it's my understanding that both do the same thing. I'm curious as to why this (the order) matters, if **both** conditions must be met for the loop to begin? – Matt2012 Oct 22 '10 at 00:51
  • The && operator short circuits. If the first condition is false, it doesn't bother to evaluate the second condition. – JoshD Oct 22 '10 at 17:06