First off, there is only one kind of newlines: '\n'
. However, on systems there is a line end sequence consisting of a new line and a carriage return ("\n\r"
) or a carriage return and newline ("\r\n"
) (these made some sense with printers using a head writing characters: sending a newline would move to the next line but staying otherwise at the position and sending a carriage return would move the head to start of the line). From the looks of it, you have a file using newlines and carriage returns for different purposes but reading the file in text mode conflate the end of line sequence. Part of the mystery can probably be addressed by opening the file in binary mode, i.e., adding the flag std::ios_base::binary
when opening the file.
That would't change the behavior of std::getline()
, however: this function reads up to the first line termination character which is by default newline ('\n'
). To read lines up to a different character you'd pass it as additional parameter (I'm using the non-member function as it deals with arbitrary long strings rather than the member function reading char
array; the member function could be used similarly):
std::ifstream in("file.csv", std::ios_base::binary);
for (std::string line; std::getline(in, line); ) {
std::istringstream sin(line);
for (std::string field; std::getline(sin, field, '\r'); ) {
std::cout << "field='" << field << "'\n";
}
}
Based on your description it seems your file uses '\r'
as a field separator. It may be something different which is probably easiest to find by opening the file in binary mode and then printing the individual characters together with their respective code:
std::ifstream in("file.csv", std::ios_base::binary);
for (std::istreambuf_iterator<char> it(in), end; it != end; ++it) {
std::cout << std::setw(3)
<< int(static_cast<unsigned char>(*it)) << ' ' << *it << '\n';
}
This will just print each character's code and the character itself. You should be able to find the value of the field separators but I'd guess '\r'
is being used.