1

I am trying to write something that works like the Linux command wc to count words, new lines and bytes in any kind of files and i can only use the C function read. I have written this code and i am getting the correct values for newlines and bytes but i am not getting the correct value for counted words.

int bytes = 0;
int words = 0;
int newLine = 0;
char buffer[1];
int file = open(myfile,O_RDONLY);
if(file == -1){
  printf("can not find :%s\n",myfile);
}
else{
  char last = 'c'; 
  while(read(file,buffer,1)==1){
    bytes++;
    if(buffer[0]==' ' && last!=' ' && last!='\n'){
      words++;
    }
    else if(buffer[0]=='\n'){
      newLine++;
      if(last!=' ' && last!='\n'){
        words++;
      }
    }
    last = buffer[0];
  }        
  printf("%d %d %d %s\n",newLine,words,bytes,myfile);        
} 
Youssef Khloufi
  • 685
  • 3
  • 13
  • 24
  • What's your output compared to your expected output? – Jordan Kaye Oct 23 '12 at 20:52
  • You need an 'inword' boolean that's yes when you're reading a word and no when you're not; when it changes to being 'in a word', you increment the word count. Define word to suit yourself. – Jonathan Leffler Oct 23 '12 at 20:53
  • you know about regular expressions? if yes then search for `libpcre` and use it in your program to let it be extensible ... else it worth the time to learn about them – Memos Electron Oct 23 '12 at 21:00
  • here's [how to count words in a string](http://stackoverflow.com/a/12699260/4279). You could adapt it for your case – jfs Oct 23 '12 at 21:01

2 Answers2

2

use isspace(char ch) function to check whitespaces.

int isInWord = 0;/*false*/
while(read(file,buffer,1)==1){
    bytes++ ;
    if(!isspace(buffer[0])){
         isInWord = 1;/*true*/
         continue;
    }else{
      if(buffer[0] == '\n'){
        newLine++;
      }else{
        if(isInWord)
         words++;
      }
      isInWord = 0;
   }
}
Aniket Inge
  • 25,375
  • 5
  • 50
  • 78
  • it fails if a file ends on a non-space e.g., `"word"`. Compare to [this algorithm](http://stackoverflow.com/a/12699260/4279) – jfs Oct 23 '12 at 21:08
1

You should reverse your logic. Rather than look for a space, and increment your word count, look for a non-space to increment the word count. Also, it can help to use a state variable versus looking at the last char:

int main(void)
{
   const char *myfile = "test.txt";
   int bytes = 0;
   int words = 0;
   int newLine = 0;
   char buffer[1];
   int file = open(myfile,O_RDONLY);
   enum states { WHITESPACE, WORD };
   int state = WHITESPACE;
   if(file == -1){
      printf("can not find :%s\n",myfile);
   }
   else{
      char last = ' '; 
      while (read(file,buffer,1) ==1 )
      {
         bytes++;
         if ( buffer[0]== ' ' || buffer[0] == '\t'  )
         {
            state = WHITESPACE;
         }
         else if (buffer[0]=='\n')
         {
            newLine++;
            state = WHITESPACE;
         }
         else 
         {
            if ( state == WHITESPACE )
            {
               words++;
            }
            state = WORD;
         }
         last = buffer[0];
      }        
      printf("%d %d %d %s\n",newLine,words,bytes,myfile);        
   } 

}

It appears that wc has some logic with respect to punctuation characters not being words, that this code does not handle.

Scooter
  • 6,802
  • 8
  • 41
  • 64