Warning: extremely boring answer ahead
C++14 [istream.formatting.arithmetic] ¶3
operator>>(int& val);
The conversion occurs as if performed by the following code fragment (using the same notation as for
the preceding code fragment):
typedef num_get<charT,istreambuf_iterator<charT,traits> > numget;
iostate err = ios_base::goodbit;
long lval;
use_facet<numget>(loc).get(*this, 0, *this, err, lval);
if (lval < numeric_limits<int>::min()) {
err |= ios_base::failbit;
val = numeric_limits<int>::min();
} else if (numeric_limits<int>::max() < lval) {
err |= ios_base::failbit;
val = numeric_limits<int>::max();
} else
val = static_cast<int>(lval);
setstate(err);
The grunt work here is done by num_get::get
, which is specified at [facet.num.get.members] ¶1:
iter_type get(iter_type in, iter_type end, ios_base& str,
ios_base::iostate& err, long& val) const;
[...]
Returns: do_get(in, end, str, err, val)
.
do_get
in turn is defined immediately afterwards ([facet.num.get.virtuals]), which specifies in excruciating detail the exact workings of the whole shebang. I won't copy three pages' worth of pain, but just the main points.
In stage 1, an "equivalent stdio format specifier" is determined according to the stream flags, as per table 85 and 86; the default value for std::ios_base
is dec | skipws
, so we'll follow that path (which corresponds to %d
). Also, some other locale and flag-specific characters are determined for the next stage.
In stage 2, characters are read from the stream and accumulated in a buffer; the essential point for your question is that
If it is not discarded, then a check is made to determine if c
is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated
So, the decision to whether keep on reading your zeroes or stop after a single zero depends on the %d
above; we'll get back to it.
In stage 3, the accumulated characters are finally converted to a long
by the rules of one of the functions declared in the header <cstdlib>
:
- For a signed integer value, the function
strtoll
.
Both the %d
specifier and strtoll
are defined in the C standard (C++14 refers to C99); let's dig them up.
At C99 §7.19.6.2 ¶12 (when talking about fscanf
) it is told that
d
Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtol
function with the value 10 for the base
argument.
So it all boils down to strtol
/strtoll
, that we can find at C99 §7.20.1.4. It is specified that the longest sequence of whitespace is skipped, and then the "subject sequence" is considered:
If the value of base
is zero, the expected form of the subject sequence is that of an integer constant as described in 6.4.4.1, optionally preceded by a plus or minus sign, but not including an integer suffix. If the value of base
is between 2 and 36 (inclusive), the expected form of the subject sequence is a sequence of letters and digits representing an integer with the radix specified by base
, optionally preceded by a plus or minus sign, but not including an integer suffix. The letters from a
(or A
) through z
(or Z
) are ascribed the values 10 through 35; only letters and digits whose ascribed values are less than that of base are permitted. If the value of base
is 16, the characters 0x
or 0X
may optionally precede the sequence of letters and digits, following the sign if present.
The subject sequence is defined as the longest initial subsequence of the input string, starting with the first non-white-space character, that is of the expected form. The subject sequence contains no characters if the input string is empty or consists entirely of white-space, or if the first non-white-space character is other than a sign or a permissible letter or digit.
If the subject sequence has the expected form and the value of base
is zero, the sequence of characters starting with the first digit is interpreted as an integer constant according to the rules of 6.4.4.1. If the subject sequence has the expected form and the value of base
is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value as given above. If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type).
(ibidem, ¶3-5)
As you can see, there are no special provisions for leading zeroes; if it is a valid digit, it goes in the subject sequence, to be processed all in the same batch.