3

I am reading a file line by line and adding each line to a string. However the string length increases by 1 for every line which I believe is due to newline character. How can I remove it from being copied.

Here is my code attempt to do the same.

if (inputFile.is_open())
{
    {
        string currentLine;
        while (!inputFile.eof())
            while( getline( inputFile, currentLine ) )
            {
                string s1=currentLine;
                cout<<s1.length();
            }

[Updated Description] i have used notepad++ to determine the length of what i am selecting line by line. So they are showing some 123, 450, 500, 120 for which my program shows 124,451,501,120. Except for the last line, all line.length() shows an increased by 1 value.

tshepang
  • 12,111
  • 21
  • 91
  • 136
typedef1
  • 247
  • 2
  • 6
  • 14

2 Answers2

7

It looks like inputFile has Windows-style line-breaks (CRLF) but your program is splitting the input on Unix-like line-breaks (LF), because std::getline(), breaks on \n by default, leaving the CR (\r) at the end of your string.

You'll need to trim the extraneous \rs. Here is one way to do it, along with a small test:

#include <iostream>
#include <sstream>
#include <iomanip>

void remove_carriage_return(std::string& line)
{
    if (*line.rbegin() == '\r')
    {
        line.erase(line.length() - 1);
    }
}

void find_line_lengths(std::istream& inputFile, std::ostream& output)
{
    std::string currentLine;
    while (std::getline(inputFile, currentLine))
    {
        remove_carriage_return(currentLine);
        output
            << "The current line is "
            << currentLine.length()
            << " characters long and ends with '0x"
            << std::setw(2) << std::setfill('0') << std::hex
            << static_cast<int>(*currentLine.rbegin())
            << "'"
            << std::endl;
    }
}

int main()
{
    std::istringstream test_data(
        "\n"
        "1\n"
        "12\n"
        "123\n"
        "\r\n"
        "1\r\n"
        "12\r\n"
        "123\r\n"
        );

    find_line_lengths(test_data, std::cout);
}

Output:

The current line is 0 characters long and ends with '0x00'
The current line is 1 characters long and ends with '0x31'
The current line is 2 characters long and ends with '0x32'
The current line is 3 characters long and ends with '0x33'
The current line is 0 characters long and ends with '0x00'
The current line is 1 characters long and ends with '0x31'
The current line is 2 characters long and ends with '0x32'
The current line is 3 characters long and ends with '0x33'

Things to note:

  • You don't need to test for EOF. std::getline() will return the stream, which will cast to false when it can read no more from inputFile.
  • You don't need to copy a string to determine its length.
Community
  • 1
  • 1
johnsyweb
  • 136,902
  • 23
  • 188
  • 247
  • i have used npp++ to determine the length of what i am selecting line by line. So they are showing some 123, 450, 500, 120 for which my program shows 124,451,501,120. Except for the last line, all line.length() shows an increased by 1 value. – typedef1 Jan 22 '12 at 09:22
  • Can you run your program against a much smaller test file with line-lengths you can count manually? What does it report for lines of zero-length? (Mine reports 0) – johnsyweb Jan 22 '12 at 09:28
1

That's because you're under MS-Windows and they add a "\r" before the "\n" and that "\r" is not removed.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
  • 1
    That is not correct, `\r` is also removed (http://stackoverflow.com/a/6089413/237483), but it's an issue, if the file is stored under Windows and read under Linux. – Christian Ammer Jan 22 '12 at 10:23
  • @ChristianAmmer I would bet big bucks that this **is** the issue! Sure, when reading a file in text mode on a Windows machine, the end of line sequence is replaced by a `\n` which is stripped but this nice invisible character at the end of each line easily messes the reads up. It is simple to verify if this is case: `if (!s.empty() && s.back() == '\r') { std::cout << "gotcha!\n"; }` – Dietmar Kühl Jan 22 '12 at 12:27
  • @Dietmar: Maybe not for you, but for me it's an issue (Client = Windows, Server = Linux), and it seems that in the OPs case, this was really the reason. – Christian Ammer Jan 23 '12 at 19:35
  • @Christian Actually if you open the file in binary mode (which is probably not the case here...) then the \r\n would not be transformed either under MS-Windows. Whether getline() is smart enough to do that work anyway, I am not 100% sure though. – Alexis Wilke Jan 24 '12 at 09:35