-1

So the assignment is to emulate the unix command wc in C. I've got most of the structure down but I've got some problems with the actual counting pieces.

#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>

int main(int argc, char *argv[]){

int file;
int newLine=0, newWord=0, newChar=0, i=0;
char *string;
char buf[101];

file = open(argv[1], O_RDONLY, 0644);

int charRead=read(file, buf, 101);

if (file == -1){
    printf("file does not exist");
}
else{
    for (i; i<100; i++){
        if (buf[i]!='\0'){
            newChar++;
        }
        if (buf[i]==' '){
            newWord++;

        }
        if (buf[i]=='\n'){
            newLine++;
        }

    }
}
printf("%d\n",newWord);
printf("%d\n",newLine);
printf("%d\n",newChar);
printf("%s",argv[1]);
close(file);

}

So the line counter works perfectly well.

The word count is always one short unless there is a space at the end of the word. I've tried to ameliorate this by making the special case:

if(buf[i]!='\0' || (buf[i]=='\0' && buf[i]!=' '))

but this doesnt' seem to work either.

The other problem is that the character count is always way off. I think it has something to do with the buffer size, but I can't seem to find much documentation on how to make the buffer work in this scenario.

Please advise. Thanks!

Gathios
  • 39
  • 4
Csteele5
  • 1,262
  • 1
  • 18
  • 36
  • In reference to the word count, consider the sentence: "This is a sentence." Words = 4; Spaces = 3; note that when getting the word count with your method you will always need to add +1 to your final space count, there is always one less space then there are words in a sentence. However, some clever person will probably find a counterexample to that. – Bruce Dean Nov 26 '13 at 03:50
  • 1
    @roybatty: such as `" This is a sentence. "` with leading and trailing spaces? It's still four words, but has more than four spaces. – Greg Hewgill Nov 26 '13 at 03:53
  • @GregHewgill, I agree except most sentences don't end with a space, so that would be incorrect grammar, maybe his code could handle that case with an if-statment to catch a space before a period. For the case of the preceding space, i.e., following a period from a previous sentence, that could be handled in a similar way. Seems like 2 easy special cases to account for, but yes I agree that needs to be taken care of. – Bruce Dean Nov 26 '13 at 03:59
  • You're not off to a good start with this program, by re-implementing your own buffered I/O over POSIX-specific system calls. – Kaz Nov 26 '13 at 04:06
  • I don't understand why these two questions were chosen as duplicates. One is Java (and only contains some hints), and the other doesn't have an accepted answer (and it doesn't count characters). – Floris Nov 27 '13 at 03:31

1 Answers1

0

EDIT I looked at the answers given in the "duplicate" questions, and I don't think they really addressed the question you had. I have written a short program that does what you want (and is "safe" in that it handles any size of input). I tested it against a short file, where it gave identical results to wc.

#include <stdio.h>
#include <string.h>

int main(int argc, char* argv[]) {
// count characters, words, lines
int cCount = 0, wCount = 0, lCount = 0, length;
char buf[1000];
FILE *fp;

if (argc < 2) {
  printf("usage: wordCount fileName\n");
  return -1;
}

if((fp = fopen(argv[1], "r")) == NULL) {
  printf("unable to open %s\n", argv[1]);
  return -1;
}

while(fgets(buf, 1000, fp)!=NULL) {
  int ii, isWord, isWhite;
  lCount++;
  isWord = 0;
  length = strlen(buf);
  cCount += length;
  for(ii = 0; ii<length; ii++) {
    isWhite = (buf[ii]!=' ' && buf[ii]!= '\n' && buf[ii] != '\t') ? 1 : 0;
    if (isWhite == 1) {
      if(isWord != 1) wCount++;
      isWord = 1;
    }
    if(isWhite == 0 && isWord == 1) {
      isWord = 0;
    }
  }
}
printf("Characters: %d\nWords: %d\nLines: %d\n\n", cCount, wCount, lCount);
return 0;
}

Note - if you have a line with more than 1000 characters in it, the above may give a false result; this could be addressed by using getline() which is a very safe (but non standard) function that will take care of allocating enough memory for the line that is read in. I don't think you need to worry about it here. If you do worry about it, you can use the same trick as above (where you have the "isWord" state) and extend it to isLine (reset when you encounter a \n). Then you don't need the inner for loop. It is marginally more memory efficient, but slower.

Floris
  • 45,857
  • 6
  • 70
  • 122