6

I am reading from my dictionary and printing out the word + the length of the word for testing purposes.

I use strlen to get the length of the string. However, the numbers I got are not correct. I believe strlen doesn't count the \0 character.

I am reading the first 10 words in the dictionary. My expected output should be:

W:A L:1
W:A's L:3
W:AA's L:4
W:AB's L:4
W:ABM's L:5
W:AC's L:4
W:ACTH's L:6
W:AI's L:3
W:AIDS's L:6
W:AM's L:4

But this is what I got (Notice how the L:'s are on another line. I think this is where the problem is):

W:A
 L:2
W:A's
 L:4
W:AA's
 L:5
W:AB's
 L:5
W:ABM's
 L:6
W:AC's
 L:5
W:ACTH's
 L:7
W:AI's
 L:5
W:AIDS's
 L:7
W:AM's
 L:5

Below is my code:

FILE* dict = fopen("/usr/share/dict/words", "r"); //open the dictionary for read-only access 
   if(dict == NULL) {
      return;
   }

   int i;
   i = 0;

   // Read each line of the file, and insert the word in hash table
   char word[128];
   while(i < 10 && fgets(word, sizeof(word), dict) != NULL) {
      printf("W:%s L:%d\n", word, (int)strlen(word));

      i++;
   }
PTN
  • 1,658
  • 5
  • 24
  • 54
  • 2
    Result of `fgets()` often include the `'\n'`. To trim, see http://stackoverflow.com/a/28462221/2410359 BTW, nicely formed question, although certainly a duplicate. – chux - Reinstate Monica Jul 12 '15 at 19:44
  • 1
    "I believe `strlen` doesn't count the `'\0'` character." -- No, it doesn't, and it's not supposed to. (Any reference to the `strlen` function, including `man strlen` if your system has man pages, would tell you that.) – Keith Thompson Jul 12 '15 at 20:07

2 Answers2

7

fgets() reads in the newline into the buffer if there's enough space. As a result, you see the newline printed when you print word. From the fgets manual:

fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.

(emphasis mine)

You have to trim it yourself:

while(i < 10 && fgets(word, sizeof(word), dict) != NULL) {
  size_t len = strlen(word);
  if ( len > 0 &&  word[len-1] == '\n' )  word[len] = '\0';

  printf("W:%s L:%d\n", word, (int)strlen(word));
  i++;
}
P.P
  • 117,907
  • 20
  • 175
  • 238
4

The reason is because fgets pulls the newline character '\n' into your buffer word each time, leading to a higher count by 1 each time.

Caleb An
  • 366
  • 1
  • 10
  • 1
    When the last line in a file is read, it is not uncomon to _not_ have a tailing `'\n'`. So `fgets()` does not always pull in a `'\n'`. – chux - Reinstate Monica Jul 12 '15 at 19:52
  • so `fgets()` does pull in an `'\n'` if the newline character exists, like I said. Obviously if the document does not have a newline character then fgets will not generate one. – Caleb An Jul 12 '15 at 19:54
  • also, if you don't have enough space in the buffer to include the newline, you'll get it at the next `fgets()`. – Luis Colorado Jul 13 '15 at 09:49