-1

I am trying to create a c program that read a file and count specific words.

I tried this code but I don't get any result:

#include<stdio.h>
#include<stdlib.h>
void main
{
  File *fp = fopen("file.txt","r+");
  int count =0;
  char ch[10]; 

  while((fgetc(fp)!=NULL)
   {
     while((fgetc(fp)!=NULL)
      {
        if((fgets(ch,3,fp))=="the" || (fgets(ch,3,fp))=="and")
         count++;
      }
   }
   printf("%d",count);
}
  • What do you mean by you don't get any result ? What do you get as output ? – Uchia Itachi Dec 30 '13 at 11:30
  • Please get your code to a state when it at least compiles, this would bring you a lot closer. In addition, you need to re-think your algorithm completely, because the words do not need to appear at a three-character boundaries. Finally, your code does not check that the words appear by themselves, so words like "these" or "stand" would be counted incorrectly. – Sergey Kalinichenko Dec 30 '13 at 11:31
  • I think than the comparison simply because may be included in the string of the other, as it is necessary to cut out the word first. – BLUEPIXY Dec 30 '13 at 11:35
  • To process individual words you should probably use something like strtok or strtok_r. See also: http://stackoverflow.com/questions/12975022/strtok-r-for-mingw – Brandin Dec 30 '13 at 11:39

8 Answers8

1

As you're acquiring data in blocks of 3 at a time, you're assuming that the two words "the" and "and" are aligned on 3 character boundaries. That will not, in general, be the case.

You also need to use strncmp to compare the strings.

As a first review, I'd read line by line and search each line for the words you want.

I'm also unsure as your intention behind having two nested while loops.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
1

You can't compare string pointers with the equality operator, you have to use the strcmp function.

There are also other problems with the code you have. For once, the fgetc calls does not return NULL on errors or problems, but EOF. Otherwise it returns a character read from the file.

Also, your two fgets in the condition will cause reading of two "lines" (though each "line" you read will only be two characters) from the file.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
1

fgets(ch, 3, fp) makes you read 2 characters plus the null-terminator, if you want to read 3 characters and the null-terminator you want fgets(ch, 4, fp) instead. Also, you need to use strcmp to compare strings.

Also, what are all those while loops for ?

Josselin Poiret
  • 508
  • 3
  • 10
0

if((fgets(ch,3,fp))=="the" || (fgets(ch,3,fp))=="and")

The above line is completely useless. fgets(ch,3,fp) gets your word from the file to ch[10] . But you cannot compare that using == . What I would do is use strcmp and give size 4 in fgets (never forget the \o)

amrx
  • 673
  • 1
  • 9
  • 23
0

You gotta use strcmp() to compare two strings. Not relational operators.

taufique
  • 2,701
  • 1
  • 26
  • 40
0

Just out of my head (perhaps not the optimal way, but should be pretty easy to read and understand):

#define WHITE_SPACE(c) ((c)==' ' || (c)=='\r' || (c)=='\n' || (c)=='\t'))

int CountWords(const char* fileName,int numOfWords,const char words[])
{
    int count = 0;
    FILE* fp = fopen(fileName,"rt");
    fseek(fp,0,SEEK_END);
    int size = ftell(fp);
    fseek(fp,0,SEEK_SET);
    char* buf = new char[size];
    fread(buf,size,1,fp);
    fclose(fp);
    for (int i=0,j; i<size; i=j+1)
    {
        for (j=i; j<size; j++)
        {
            if (WHITE_SPACE(buf[j]))
                break;
        }
        for (int n=0; n<numOfWords; n++)
        {
            int len = strlen(words[n]);
            if (len == j-i && !memcmp(buf+i,words[n],len))
                count++;
        }
    }
    delete[] buf;
    return count;
}

Please note, however, that I have not compiled nor tested it (as I said above, "out of my head")...

barak manos
  • 29,648
  • 10
  • 62
  • 114
0

Take a look at String matching algorithms.

You can also find implementation examples of Boyer-Moore in github

AndreDurao
  • 5,600
  • 7
  • 41
  • 61
0

The line

if((fgets(ch,3,fp))=="the" || (fgets(ch,3,fp))=="and")

has a couple of problems:

  • You can't compare string values with the == operator; you need to use the strcmp library function;
  • You're not comparing the same input to "the" and "and"; when the first comparison fails, you're reading the next 3 characters from input;

Life will be easier if you abstract out the input and comparison operations; at a high level, it would look something like this:

#define MAX_WORD_LENGTH 10 // or however big it needs to be
...
char word[MAX_WORD_LENGTH + 1];
...
while ( getNextWord( word, sizeof word, fp )) // will loop until getNextWord 
{                                             // returns false (error or EOF)
  if ( match( word ) )
    count++;
}

The getNextWord function handles all the input; it will read characters from the input stream until it recognizes a "word" or until there's no room left in the input buffer. In this particular case, we'll assume that a "word" is simply any sequence of non-whitespace characters (meaning punctuation will be counted as part of a word). If you want to be able to recognize punctuation as well, this gets a bit harder; for example, a ' may be quoting character ('hello'), in which case it should not be part of the word, or it may be part of a contraction or a posessive (it's, Joe's), in which case it should be part of the word.

#include <ctype.h>
...
int getNextWord( char *target, size_t targetSize, FILE *fp )
{
  size_t i = 0;
  int c;

  /**
   * Read the next character from the input stream, skipping
   * over any leading whitespace.  We'll add each non-whitespace
   * character to the target buffer until we see trailing 
   * whitespace or EOF.
   */
  while ( (c = fgetc( fp )) != EOF && i < targetSize - 1 )
  {
    if ( isspace( c ) )
    {
      if ( i == 0 )
        continue;
      else
        break;
    }
    else
    {
      target[i++] = c;
    }
  }

  target[i] = 0;      // add 0 terminator to string
  return i > 0;       // if i == 0, then we did not successfully read a word
}

The match function simply compares the input word to a list of target words, and returns "true" (1) if it sees a match. In this case, we create a list of target words with a terminating NULL entry; we just walk down the list, comparing each element to our input. If we reach the NULL entry, we didn't find a match.

#include <string.h>
...
int match( const char *word )
{
  const char *targets[] = {"and", "the", NULL};
  const char *t = targets;

  while ( t && strcmp( t, word ))
    t++;

  return t != NULL;  // evaluates to true if we match either "the" or "and"
}

Note that this comparison is case-sensitive; "The" will not compare equal to "the". If you want a case-insensitive comparison, you'll have to make a copy of the input string and convert it all to lowercase, and compare that copy to the target:

#include <stdlib.h>
#Include <ctype.h>
#include <string.h>
...
int match( const char *word )
{
  const char *targets[] = {"and", "the", NULL};
  const char *t = targets;

  char *wcopy = malloc( strlen( word ) + 1 );
  if ( wcopy )
  {
    char *w = word;
    char *c = wcopy;

    while ( *w )
      *c++ = tolower( *w++ );
  }
  else
  {
    fprintf( stderr, "malloc failure in match: fatal error, exiting\n" );
    exit(0);
  }

  while ( t && strcmp( t, wcopy))
    t++;

  free( wcopy );
  return t != NULL;  // evaluates to true if we match either "the" or "and"
}
John Bode
  • 119,563
  • 19
  • 122
  • 198