2

I used a file which had 15 lines with 2 characters each and hence assumed the size of the file to be around 44 bytes but, using the tellg function, the size is shown as 58. Furthermore, I accumulated an array of all the positions the code was identifying a newline character and they were all consecutive and hence confirmed this doubt. Thank you!

//Tailfile - This program accepts a file and prints the last 10 lines.
//This function determines the number of lines and how to display it
int lineidentifier(fstream&tailfile,long& position)
{
    tailfile.seekg(0,ios::end);//sets the read position at the end of file.
    long n=0;//counter for the number of lines
    long i=tailfile.tellg();//counter for the number of characters set to 
                        //thenumber of bytes in the file and hence, the end.
    char ch;//To hold and check the character.
    while(n<10&&i>=0)//conditions are as long as the number of characters 
                 //are not exhausted or the number of lines
    {
        tailfile.seekg(i, ios::beg);//sets the read position to the end of 
                   //the file by using the number of characters and the file
                                //mode as the beginning.
        cout<<"1. "<<i<<endl;//DEBUGGING EXTRA
        tailfile.get(ch);//Reads the content at i
        tailfile.clear();//clears the eof flag set by the first iteration 
                          //because we reach the end of the file.
        cout<<"2. "<<i<<endl;//DEBUGGING EXTRA
        if(ch=='\n')//if the character received is the newline character 
                 //leading to us regarding it as a line has been identified.
        {
            n++;//Increment n accordingly.
            position=i;//The position is the byte i is at before the 
                 //character was read, hence the position of the character.
            cout<<position<<endl;//DEBUGGING EXTRA
            cout<<ch<<endl;//DEBUGGING EXTRA
            i--;
        }
        i--;
        cout<<"4. "<<i<<endl;//DEBUGGING EXTRA
    }
    cout<<i<<endl;//DEBUGGING EXTRA
    if(i<=1)//Using the position of i to indicate whether the file has more 
         //than 10 lines. If i is less than 1, it has reached the
    //beginning of the file
        return 0;
    else
        return 1;
}
NoobTove
  • 21
  • 1
  • 2
    afaik Windows uses `\n\r`, Linux/Unix `\n`, and macOS `\r`to represent a new line – max Jun 20 '17 at 06:38
  • Some platforms write \r\n as a newline, so each newline consists of 2 characters - however if the file is opened in text mode, the \r\n sequence is converted to a single \n character when you read it back. What platform are you on ? – nos Jun 20 '17 at 06:38
  • 1
    And if you think that's crazy, just you wait until you're introduced to endian. – user4581301 Jun 20 '17 at 07:01

3 Answers3

2

Linux uses \n (Line Feed, 0x0A) as its new line character.

Windows/DOS uses \r\n (Carriage Return (0x0D) and Line Feed (0x0A)) as its new line character.

Likely you are reading a DOS-encoded file.

This answer provides further details.

Richard
  • 56,349
  • 34
  • 180
  • 251
1

Open your file with a binary file editor, like Hexedit, you'll most likely see that new lines are coded with \n\r (0x0A, "line feed" and 0x0D, "carriage return"), not just \n.

By the way, just read the file using getline:

std::ifstream infile("thefile.txt");
std::string line;
while (std::getline(infile, line))
{

}

then , you don't care have to worry about how EOL was coded...

jpo38
  • 20,821
  • 10
  • 70
  • 151
0

The answers I've seen so far are essentially correct, but they muddle two different notions. '\n' and '\r' are escape sequences; each one represents a single character whose value is implementation-dependent. Typically those are 0x0A and 0x0D because that's often convenient, but they are not required to have those values.

When you write the character '\n' to an output stream, the runtime library does whatever is needed to produce a new line. For Unix, the convention is that the byte 0x0A means "start a new line". For Windows, the convention is that the byte 0x0A means "move down to the next line" (i.e., line feed) and the byte 0x0D means "move to the start of the current line"; the combination starts a new line.

In the ASCII encoding, the values 0x0A and 0x0D represent a line feed and a carriage return, respectively. They have no inherent connection to the C/C++ escape sequences '\n' and '\r'.

Pete Becker
  • 74,985
  • 8
  • 76
  • 165