5

while I am trying to copy a file into string using fread() ,I am getting extra characters from file which is exactly equal to number of new lines. Here is my code:

#include <stdio.h>
#include <stdlib.h>
#define LEN 5000000

int main()
{
   char *in = (char*) malloc(LEN);
   FILE *f=fopen("in.txt","r");
   fread(in,5000000,1,f);
   printf("%ld\n", ftell(f)); 
   in[ftell(f)]=0;
   int l;
   for(l=0;true;l++)
   {
      if(in[l]<10)
        break;
      printf("%d ",in[l]);
   }
   printf("\n");
}

Input for this program is:

1  
2  
<newline>

link for input : https://paste.fedoraproject.org/388281/46780193/
For output i am printing ASCII values of characters read:

6  
49 10 50 10 13 10  

if Input is:

1  
2  
3  
<newline>  

link for input: https://paste.fedoraproject.org/388280/
then output is:

9  
49 10 50 10 51 10 51 13 10  

I saw some other test cases.In every test case extra number of characters are always number of new lines.
I have few questions:
-why the pattern is like this?
-How is this this related to the fact that new line take 2 bytes in windows?
-How to get rid of those extra characters?
I googled for similar questions ,but didn't find answer.Please somebody explain?

phoenix
  • 87
  • 1
  • 8
  • 2
    Why? Because that's exactly what is in the file. The newline character (`\n`) needs to be there to denote, well, a *new line*. And there are many ways to get rid of the newline. The best way depends on what you are trying to achieve. See for example [Removing trailing newline character from fgets() input](http://stackoverflow.com/questions/2693776/removing-trailing-newline-character-from-fgets-input) – kaylum Jul 06 '16 at 10:41
  • Please show the exact contents of your file. – 2501 Jul 06 '16 at 10:42
  • Also, why aren't you using `LEN`, which you defined specifically instead of typing out the number in your `fread()`call? – Magisch Jul 06 '16 at 10:54
  • DOS newline is CR+LF – stark Jul 06 '16 at 10:56
  • 1
    `fread` is not intended to be used for reading text, `fgets` is the better option. – Weather Vane Jul 06 '16 at 11:02

3 Answers3

4

Calling ftell on a stream opened in text mode, such as in your example is not meaningful1.

The usage of the function fread is not correct, the size and count arguments are switched. This means that the read is always partial, since your file doesn't have 5000000 characters in them. Thus the values of elements in the array after the call, have indeterminate2 values. (The logical element in your case being a single element of size 5000000.)

The results you're seeing aren't meaningful. Reading indeterminate values can cause undefined behavior.

The correct way to read your file is to pass correct parameters to fread and use the return value to determine the number of successfully read characters:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <assert.h>

int main()
{
    unsigned char in[500] = { 0 } ;
    FILE *f=fopen("in.txt","r");
    assert( f ) ;

    const size_t read = fread(in,1,500,f);
    printf( "read: %zu\n" , read );

    for( size_t index = 0 ; index < read ; index++ )
    {
        printf( "%hhu " , in[index] );
    }

    fclose( f );
}

Using this correct program, when the file has the content (dots aren't a part of the file):

.
1
2
3

.

will read and print correct values:

read: 7
49 10 50 10 51 10 10

One newline character, represented3 by the value 10, for each number, and an additional one at the end.


1 (Quoted from: ISO:IEC 9899:201x 7.21.9.4 The ftell function 2)
For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.

2 (Quoted from: ISO:IEC 9899:201x 7.21.8.1 The fread function 2)
If a partial element is read, its value is indeterminate.

3 In windows files, a newline is represented by two characters: 13, 10. A carriage return and a line feed. But when reading the file in text mode, a newline is always just the line feed character: 10. You saw the character 13 because of the behavior of your program wasn't meaningful. If you (correctly) opened and read the file in binary mode, you would see the newline represented by both characters.

2501
  • 25,460
  • 4
  • 47
  • 87
0

I'm not aware how it affected the program flow but I suffered from the same issue until I changed the file access mode from "r" to "rb", though that still was a plain text file.

So; in addition to user @2501's advice (the accepted answer), this should be taken into account and the following lines

FILE* ptrFile = fopen("fileName.txt", "r");
fread(in, 500, 1, ptrFile);

should be corrected as

FILE* ptrFile = fopen("fileName.txt", "rb");
fread(in, 1L, 500, ptrFile);
ssd
  • 2,340
  • 5
  • 19
  • 37
-2

If you are using windows and edited the file in.txt using some editor which attaches CR-LF (Carriage-Return, LINE-FEED) ((ASCII) 13, 10) to each newline this will surely happen. Try writing the in.txt by a program and then read it. It will be do as expected. Or use a editor which doesn't attach a CR-LF to eol (end of line). sorry I don't know no such editor [But some linux editors will work.].

Madhusoodan P
  • 681
  • 9
  • 20