I have written a C program to count the words
,characters
and lines
in a text file. The program is reading lines and words correctly but is not counting the total characters correctly.
I am using GitBash on windows, so I used the wc
command for checking my program's correctness. It always shows x characters more than that my program's output, where x is the no. of new line characters in my program.
Here is my program:
#define IN 1 // if getc is reading the word
#define OUT 0 // if getc has read the word and now reading the spaces
int main()
{
FILE *fp = fopen("lorum ipsum.txt","r");
int lineCount = 0;
int wordCount = 0;
int charCount = 0;
int c;
int position = IN; //tells about the reading position of getc whether reading the word or has read the word
while((c=getc(fp)) != EOF)
{
if(c == '\n')
{
lineCount++;
}
if(c == '\n' || c == '\t' || c==' ')
{
if(position == IN) // means just finished reading the word
{
wordCount++;
position = OUT; // is now reading the white spaces
}
}
else if(position == OUT)
{
//puts("This position is reached");
position = IN; //currently reading the word
}
charCount++;
}
// printing to output
return 0;
}
Here the whole code does not matter, what matter is that I am increasing the charCount
variable for every character read by getc
in the while
loop.
Also, I checked for the '\n'
character size by using sizeof()
, it is just a simple character and occupies 1 byte; so we should count it as one.
Also from the file size I came to know that wc
is outputting the correct results. So what is the problem, is there any issue in the encoding in which my text file is stored?
NOTE: Every time I add a newline in my text file by pressing ENTER, the size of the file is increased by two and so as the number of characters counted by the wc
command but my program's output characters change by one.
EDIT: According to the good answers I understood that there are extra \r
characters at the newline. So when r
mode is used it interprets the newlines as \n
, only when using the binary mode rb
it shows up the actual \r\n
. Here is the answer about this behavior:
what's the differences between r and rb in fopen