1

For reference I have already looked at Why does std::getline() skip input after a formatted extraction?

I want to understand cin and getline behavior. I am imagining cin and getline to be implemented with a loop over the input buffer, each iteration incrementing a cursor. Once the current element of the input buffer equals some "stopping" value (" " or "\n" for cin, "\n" for getline), the loop breaks.

The question I have is the difference between the reading behavior of cin and getline. With cin, it seems to stop at "\n", but it will increment the cursor before breaking from the loop. For example,

string a, b;
cin >> a;
cin >> b;
cout << a << "-" << b << endl;
// Input: "cat\nhat"
// Output: "cat-hat"

So in the above code, the first cin read up until the "\n". once it hit that "\n", it increments the cursor to the next position "h" before breaking the loop. Then, the next cin operation starts reading from "h". This allows the next cin to actually process characters instead of just breaking.

When getline is mixed with cin, this is not the behavior.

string a, b;
cin >> a;
getline(cin, b);
cout << a << "-" << b << endl;

// Input: "cat\nhat"
// Output: "cat-"

In this example, the cin reads up to the "\n". But when getline starts reading, it seems to be reading from the "\n" instead of the "h". This means that the cursor did not advance to "h". So the getline processed the "\n" and advances the cursor to the "h" but does not actually save the getline to "b".

So in one example, cin seems to advance the cursor at "\n", whereas in another example, it does not. getline also exhibits different behaviors. For example

string a, b;
getline(cin, a);
getline(cin, b);
cout << a << "-" << b << endl;

// Input: "cat\nhat"
// Output: "cat-hat"

Now getline actually advances the cursor on the "\n". Why is there different behavior and what is the actual implementation of cin vs getline when it comes to delimeter characters?

einpoklum
  • 118,144
  • 57
  • 340
  • 684
Jeremy Fisher
  • 2,510
  • 7
  • 30
  • 59
  • 2
    "So in one example, cin seems to advance the cursor at "\n", whereas in another example, it does not." No, it does not in either case. Reading from `cin` using `operator>>`, by default, skips **leading** whitespace, not trailing whitespace. As your account is over 8 years old and you have multiple gold badges, you should understand by now the [expectation for research](https://meta.stackoverflow.com/questions/261592). Questions like this one are [easily answered](https://duckduckgo.com/?q=cin+whitespace) with a search engine. – Karl Knechtel Nov 27 '21 at 19:36
  • Oh interesting. But if cin is able to distinguish between leading and trailing "\n", then why doesn't getline skip leading "\n" – Jeremy Fisher Nov 27 '21 at 19:38
  • `cin` and `getline()` do not exhibit different behaviour. Both `getline()` and formatted extraction (using operator `<<`) interact with the stream, and what you are seeing is that they interact with `cin` (and any stream) differently. They do that because they are specified differently. `operator<<()` skips white space, reads the value (if it can), and stops when it reaches whitespace. `getline()` (by default) keeps reading until it encounters a newline - and discards the newline. Using them both together can cause unexpected interactions with some user input. – Peter Nov 27 '21 at 21:19

2 Answers2

2

reading behavior of cin and getline.

cin does not "read" anything. cin is an input stream. cin is getting read from. getline reads from an input stream. The formatted extraction operator, >>, reads from an input stream. What's doing the reading is >> and std::getline. std::cin does no reading of its own. It's what's being read from.

first cin read up until the "\n". once it hit that "\n", it increments the cursor to the next position

No it doesn't. The first >> operator reads up until the \n, but does not read it. \n remains unread.

The second >> operator starts reading with the newline character. The >> operator skips all whitespace in the input stream before it extracts the expected value.

The detail that you're missing is that >> skips whitespace (if there is any) before it extracts the value from the input stream, and not after.

Now, it is certainly possible that >> finds no whitespace in the input stream before extracting the formatted value. If >> is tasked with extracting an int, and the input stream has just been opened and it's at the beginning of the file, and the first character in the file is a 1, well, the >> just doesn't skip any whitespace at all.

Finally, std::getline does not skip any whitespace, it just reads from the input stream until it reads a \n (or reaching the end of the input stream).

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • Right but thats my confusion. If ">>" reads until "\n" but doesn't read "\n", then that means subsequent uses of ">>" must be skipping leading "\n". Otherwise the cursor would be stuck at "\n". But getline doesn't seem to be skipping LEADING "\n" (newline, not whitespace). So getline reads the "\n" but then stops. So subsequent getline operations start reading from the actual characters. – Jeremy Fisher Nov 27 '21 at 19:44
  • Your understanding is accurate. `\n` is considered whitespace, and everything in my answer is consistent with your comment. What led you to expect that `std::getline` skips leading whitespace? It doesn't, of course. `>>` skips whitespace, `getline` does not. They are different functions. They do different things. That's why they exist: they do different things. – Sam Varshavchik Nov 27 '21 at 19:46
  • @JeremyFisher • a newline **is** whitespace. – Eljay Nov 27 '21 at 19:46
  • @Eljay that was my confusion. I didn't consider "\n" whitespace. "\n" to me is just a special character representing "go to the next line". "whitespace" to me meant " " – Jeremy Fisher Nov 27 '21 at 19:49
  • @SamVarshavchik one thing then, if getline doesn't skip leading whitespace, why do 2 back to back getline operations read the lines successfully? Wouldn't the first getline operation read the characters up to "\n" but then the 2nd would read the "\n" and stop? – Jeremy Fisher Nov 27 '21 at 19:51
  • 1
    No, `getline` reads the `\n`, it just doesn't put it into the `std::string`. `std::getline` reads up to ***and including*** the next `\n`, and puts everything it reads, ***except*** the newline into the string. – Sam Varshavchik Nov 27 '21 at 19:54
  • Got it, that makes sense now. Thanks! – Jeremy Fisher Nov 27 '21 at 19:54
  • @JeremyFisher • The [whitespace](https://en.cppreference.com/w/cpp/string/byte/isspace) characters: `\t \n \v \f \r` and space. – Eljay Nov 27 '21 at 20:07
  • The term "whitespace" includes space characters, newline, vertical and horizontal tab, form feed, carriage return, and backspace. `std::isspace('\n')` and `std::isspace(' ')` both return a non-zero value. – Peter Nov 27 '21 at 21:23
1

tl;dr: it's because how std::cin is intra-line-oriented while getline is line-oriented.

Historically, in C's standard library, we had the functions scanf() and getline():

  • When you tell scanf() to expect a string, it

    ... stops at white space or at the maximum field width, whichever occurs first.

    and more generally,

    Most conversions [e.g. readings of strings] discard initial white space characters

    (from the scanf() man page)

  • When you call getline(), it:

    reads an entire line ... the buffer containing the text ... includes the newline character, if one was found.

    (from the getline() man page)

Now, C++'s std::cin mechanism replaced scanf() for formatted input matching, but with type safety. (Actually std::cin and std::cout are quite problematic as replacements, but never mind that now.) As a substitute for scanf(), it inherits many of its features, including being averse to picking up white space.

Thus, just like scanf(), running std::cin >> a for a string a will stop before a \n character, and keep that line break in the input stream for future use. Also, just like scanf(), std::cin's >> operator skips leading whitespace, so if you use it a second time, the \n will be skipped, and the next string picked up starting from the next line's first non-whitespace character.

With std::getline(), you get the exact same getline() behavior of decades past.


PS - you can control the whitespace-skipping behavior using the skipws format-flag of std::cin

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • this is a good answer as well. One question - if I had a string with a bunch of newlines, like "test string\n\n\n\n\n\n\n\n\n second line", would 2 getlines back to back read the strings properly? Because first getline should read all the "\n" to " " before "second line"? – Jeremy Fisher Nov 27 '21 at 20:00
  • 1
    @JeremyFisher: You would need as many getline's as you have `\n`'s, since those consecutive `\n` designate ends of empty lines. An empty line is a line. Anyway, consult the man page or, for the C++ version, the [cppreference page](https://en.cppreference.com/w/cpp/string/basic_string/getline). – einpoklum Nov 27 '21 at 20:04