1

I'm trying to read a large buffer from a socket which uses \0 to delimit pieces of data and \n to delimit lines.

I thought getline() would be an easy way to get each line but it's behaving strangely.

I'm using \n as the delimiter in getline().

string line;
string test1 = "aaa,123\nbbb\nccc,456\n";
stringstream ss1(test1);
while(std::getline(ss1, line, '\n')) {
    cout << line << endl;
    }
// outputs:
// aaa,123
// bbb
// ccc,456

string test2 = "aaa\0123\0\nbbb\0\nccc\0456\0\n";
stringstream ss2(test2);
while(std::getline(ss2, line, '\n')) {
    cout << line << endl;
    }
// outputs:
// aaa
// 3

Why is this happening in test2? Where is the 3 coming from? Must I remove the \0 to make this work? Is there an easier/better way to mark strings in my buffer when I do a socket recv()?

Alex_B
  • 1,651
  • 4
  • 17
  • 24
  • Why does the data from the socket have nulls in the lines in the first place? If it's supposed to be text, there shouldn't be embedded nulls. – Barmar Oct 03 '14 at 02:52
  • 2
    Oh, I see where the 3 comes from. The first `\0` isn't a null, it's the start of `\012`, which is a carriage return. Then the 3 follows. – Fred Larson Oct 03 '14 at 03:00
  • 1
    It's a line feed, not carriage return. CR is `\015`. LF is also the C newline character. – Barmar Oct 03 '14 at 03:01

2 Answers2

3

\0 in a special symbol. It shows when the string ends.

For example, if you type in "a string", the compiler automatically adds a \0 on the end, which signifies the end of the string. However, it is legal to have a \0 in the middle of the string, it just means that everything after it is ignored.

So basically, any operation you do on the string, not just the getline, will treat the string as "aaa", ignoring everything after the first \0 that is found. But...

As @Fred Larson points out

Oh, I see where the 3 comes from. The first \0 isn't a null, it's the start of \012, which is a carriage return. Then the 3 follows.

So actually, the string is being treated as "aaa\n3". Which is why you get the output you do.

Edit: And thanks to Galik, I will also add that these rules I mention might only apply to a string literal / c-string. It may be a different case with std::strings, in which the length of the string is known ahead of time.

BWG
  • 2,238
  • 1
  • 20
  • 32
  • @Galik It's fine, I would like to know where I got it wrong though. – BWG Oct 03 '14 at 03:01
  • @Galik it's true for the string constructor from a string literal! which is important here. More generally though, an embedded NUL is placed in the array properly and it need not be ignored by code using that literal (for example, a write specifying the full length will find all the data there), it's just ASCIIZ treatment that will ignore it. – Tony Delroy Oct 03 '14 at 03:02
  • And I do recognize that something I am saying may not be correct on the technical level, but for practical purposes I feel it is correct. If its not I'll just delete it. – BWG Oct 03 '14 at 03:07
  • Actually @BWG I apologise. I actually misread your answer thinking you were talking about std::string rather than ordinary c-strings.... I'll remove my comment. – Galik Oct 03 '14 at 03:14
  • @Galik Hey, it's cool. I'll add what you said to my answer. – BWG Oct 03 '14 at 03:16
  • 2
    Actually @Fred Larson has the full explanation what's going on here. The `\0123` is being read by the compiler as `\012` (3 digit octal number) followed by `3`. The terminating null comes after that. – Galik Oct 03 '14 at 03:23
  • Well, I didn't see that. And I didn't know that would happen. Thanks. – BWG Oct 03 '14 at 03:27
0

\0 is the standard string terminator symbol. As such, you may either read character by character or avoid \0 as delemeters

Dr. Debasish Jana
  • 6,980
  • 4
  • 30
  • 69