0

I am trying to write a program that prints the number of words found in a text file. Words are defined as sequences of characters separated by any number of white space.

However, I am having a problem when there are multiple whitespaces because then it doesn't report the right number of words.

Here is my code so far:

#include <stdio.h>

int main()
{
   FILE *fp;
   char str;
   int i=0;

   /* opening file for reading */
   fp = fopen("myfile.txt" , "r");
   if(fp == NULL) {
      perror("Error opening file");
      return(-1);
   }
   while(( str = fgetc(fp)) != EOF ) {
     if (str == ' ')
             ++i;
   }

   printf("%d\n", i);
   fclose(fp);

   return(0);
}

myfile.txt is:

Let's do this!      You can do it. Believe in yourself.

I'm not sure if I use fgets, fscanf, or fgetc.

Let's say I define whitespace as it is defined in the fscanf function when reading a string

It prints 14 which is not right. I'm not sure how to account for multiple whitespaces. In this case, whitespaces are any number of spaces between words.

Rohit Tigga
  • 2,373
  • 9
  • 43
  • 81

4 Answers4

1

Counting a whitespace only if it is not preceded by any other white space will do the trick.

#include <stdio.h>

int main()
{
   FILE *fp;
   char str;
   char prevchar; //tracks the previous character
   int i=0;

   /* opening file for reading */
   fp = fopen("myfile.txt" , "r");
   if(fp == NULL) {
      perror("Error opening file");
      return(-1);
   }
   prevchar='x'; //initialize prevchar to anything except a space
   while(( str = fgetc(fp)) != EOF ) {
     if (str == ' ' && prevchar!=' ') // update the count only if previous character encountered was not a space
             ++i;
   prevchar=str;
   }

   printf("%d\n", i+1);
   fclose(fp);

   return(0);
}

Edit: The code assumes that words are separated by one or more spaces and does not cover all the corner cases like when sentences spread over multiple lines or when words are separated by comma and not spaces. But these cases can be covered by adding more conditions.

santosh-patil
  • 1,540
  • 1
  • 15
  • 29
  • Unfortunately, this doesn't work when I add text on another line. Edit: Just read your comment. What would I implement to count in another space – Rohit Tigga Mar 07 '14 at 10:05
  • 2
    Of course, this code doesn't cover many corner cases. But the gist here is, you can achieve them by adding more conditions. – santosh-patil Mar 07 '14 at 10:15
1

just use a little state diagram, two cases are, either you are inside a word, or you are outside a word

#include <stdio.h>

int main()
{
FILE *fp;
char str;
int i=0,inside_word =0;

/* opening file for reading */
fp = fopen("myfile.txt" , "r");
if(fp == NULL) {
    perror("Error opening file");
    return(-1);
}
inside_word =0;
while(( str = fgetc(fp)) != EOF ) {
    if (str == ' ' || str == '\n' || str == '\t')
        inside_word = 0;
    else if(inside_word == 0){
        i++;
        inside_word=1;
    }
}

printf("%d\n", i);
fclose(fp);

return(0);
}
tesseract
  • 891
  • 1
  • 7
  • 16
0

First thing comes into my mind is, add another while loop right after ++i to exhaust space characters.

And by the way, be careful with your terminology, you are not dealing with whitespaces you are just taking care of space characters. \t and \n are also whitespaces!

-1

How about using regular expression such as '!\s+!' to replace with a single space ' ', then continue with your code

Siva Senthil
  • 610
  • 6
  • 22
  • This is the warning I got: comparison of constant 561195809 with expression of type 'char' is always false [-Wtautological-constant-out-of-range-compare] – Rohit Tigga Mar 07 '14 at 10:07
  • @XiJiaopin can you please share the code which is giving this warning? – Siva Senthil Mar 10 '14 at 14:06
  • @XiJiaopin, I stumbled on a [MSDN](http://msdn.microsoft.com/en-us/library/a9z6549f.aspx) example which matches your exact requirement. However, the code is in C++. I thought it might still be of help. – Siva Senthil Apr 15 '14 at 09:06
  • @XiJiaopin, I have to guess on the warning you have mentioned above. The result of regex matching might yield a constant like REG_NOMATCH, which you might treat as boolean in an if condition. Thus the warning. Hope this helps – Siva Senthil Apr 15 '14 at 09:10