23

I would expect the following to leave the buf_iter pointing to the character n characters after the point at which it started. Instead it is left pointing at the last character read. Why is this? i.e. if I do in_stream.tellg() before and after the copy_n, they differ not by n but by (n-1). Had I read n characters with in_stream.read, then the position would be advanced by n.

std::istreambuf_iterator<char> buf_iter(in_stream);
std::copy_n(buf_iter, n, sym.begin());

I've looked at the implementation and it clearly does this on purpose, skipping the final increment.

Another post here mentions that incrementing the from iterator when it is hooked up to, say, cin, will cause one too many reads since a read is done on operator++(). That sounds like an issue with cin - why isn't the read done on operator*()?

Does the standard specify this anywhere? The docs I've seen don't mention what happens to the from iterator, and I've seen two different pages that give "possible correct implementations" that do each of the behaviours:

At cppreference we have:

template< class InputIt, class Size, class OutputIt>
OutputIt copy_n(InputIt first, Size count, OutputIt result)
{
    if (count > 0) {
        *result++ = *first;
        for (Size i = 1; i < count; ++i) {
            *result++ = *++first;
        }
    }
    return result;
}

while at cplusplus.com we have:

template<class InputIterator, class Size, class OutputIterator>
  OutputIterator copy_n (InputIterator first, Size n, OutputIterator result)
{
  while (n>0) {
    *result = *first;
    ++result; ++first;
    --n;
  }
  return result;
}

Both do n reads and result in the same contents in result. However, the first will only increment the "first" iterator n-1 times, and the second will increment it n times.

What gives? How do I write portable code? I can use tellg and then seekg but then I might as well just do the loop by hand (ugh!).


Note that I'm not trying to read from the iterator after calling copy_n, rather I want to read from the underlying stream after calling copy_n, and the problem is that copy_n is left pointing on byte short of where I had expected it to be. For now I'm going with the somewhat hideous but apparently portable:

auto pos = in_stream.tellg();
std::istreambuf_iterator<char> buf_iter(in_stream);
std::copy_n(buf_iter, cl, sym.begin());

in_stream.seekg(pos + cl);

uint64_t foo;
in_stream.read(reinterpret_cast<char *>(&foo), 8);

BTW, in case its not clear, I'm trying to avoid copying the data into a buffer and then again into the string sym.


@DaveS: Moving out of my specific problem, here is a simple program that does not output what I would expect due to the fact that the input iterator isn't incremented the final time:

#include <algorithm>
#include <string>
#include <iostream>
#include <fstream>

int main(int argc, const char * argv[])
{
    std::ifstream in("numbers.txt");

    std::istreambuf_iterator<char> in_iter(in);
    std::ostreambuf_iterator<char> out_iter(std::cout);

    std::copy_n(in_iter, 3, out_iter);
    std::cout << std::endl;

    std::copy_n(in_iter, 3, out_iter);
    std::cout << std::endl;

    std::copy_n(in_iter, 3, out_iter);
    std::cout << std::endl;

    return 0;
}

The input file is just "0123456789\n"

I'm getting:

012
234
456

Because of the side-effect of istreambuf_iterator::operator++(), this would give a different result if copy_n was implemented to increment the input iterator n times.


@aschepler: Need to capture the local parameter, but I'm going with it:

 std::generate_n(sym.begin(), cl, [&in_stream](){ return in_stream.get(); });
Community
  • 1
  • 1
sfjac
  • 7,119
  • 5
  • 45
  • 69
  • 2
    Or how about: `std::generate_n(sym.begin(), cl, [](){ return in_stream.get(); });`? – aschepler Apr 25 '14 at 19:50
  • 1
    I asked the C++ standard discussion mailing list. Maybe someone can explain the rationale. Topic here: https://groups.google.com/a/isocpp.org/forum/#!topic/std-discussion/UdnfYg7HPjg – Brian Bi Apr 25 '14 at 20:02

3 Answers3

9

The reason why many std::copy_n implementations increment n-1 times is due to the interactions with istream_iterator, and how that is typically implemented.

For example, if you had an input file with integers in them

std::vector<int> buffer(2);
std::istream_iterator<int> itr(stream); // Assume that stream is an ifstream of the file
std::copy_n(itr, 2, buffer.begin());

Because istream_iterator is specified to read on increment (and on either construction or first dereference), if std::copy_n incremented the input iterator 2 times, you would actually read 3 values out of the file. The third value would simply be discarded when the local iterator inside copy_n went out of scope.

istreambuf_iterator doesn't have the same interactions, since it doesn't actually copy the value from the stream into a local copy, like most istream_iterators do, but copy_n still behaves this way.

Edit: Example of losing data if copy-N incremented N times (the cplusplus.com description, which doesn't seem to be correct). Note, this really only applies to istream_iterators or other iterators that reads and remove their underlying data on increment.

std::istream_iterator<int> itr(stream); // Reads 1st value

while(n > 0) // N = 2 loop start 
{       
 *result = *first;
 ++result; ++first; // Reads 2nd value
 --n; // N: 1
 // N = 1 loop start
 *result = *first;
 ++result; ++first; // Reads 3rd value
 --n; // N :0
 // Loop exit
}
return result;
Dave S
  • 20,507
  • 3
  • 48
  • 68
  • I don't think this is true - the first read is done before an increment, so as it stands this experiment will result in the last character from the first read also being the first character from the second read, I think. In particular, calling `copy_n(input, 1, output)` in a loop would just cause it to repeatedly write the same (current) character to the output and never call `operator++()` on the input iterator. – sfjac Apr 25 '14 at 20:19
  • @sfjac: I reworded to remove the concept of a loop (since it confused me right after I came back to it). If you use `std::istream_iterator` and caused copy_n to increment the input iterator N times, you would lose a character from your file. I can walk through that in detail if you'd like – Dave S Apr 25 '14 at 20:21
  • Interesting rationale. Consider a file `1 2 hello`. Incrementing the iterator twice would set the failbit etc. – dyp Apr 25 '14 at 20:34
  • 1
    @dyp: Reasons like this are why I personally avoid the `istream_iterator` and `istreambuf_iterator` unless I'm reading the entire file/buffer. I find that trying to fit the abstraction of iterators over the files gets me into trouble in anything beyond the simplest of cases. – Dave S Apr 25 '14 at 20:39
  • 1
    @dyp: [And here's an update](http://coliru.stacked-crooked.com/a/726ebbc01f233ab1) that shows that if you increment N times, it reads too much. – Dave S Apr 25 '14 at 20:44
  • See my added example. – sfjac Apr 25 '14 at 20:56
  • @sfjac: With your updates, the short answer is because it's left unspecified exactly how many times the input iterator will be incremented. The simple program relies on too much on how the algorithm is implemented to be safe. It relies on the number of increments performed by an algorithm which doesn't specify the number of increments that will be performed. – Dave S Apr 25 '14 at 21:09
  • 2
    Bottom line to me is that this violates the principle of least surprise and thus seems like a bug, whether in the spec or the implementation, I don't really care. Users shouldn't have to care about the details of when the read happens - that's the point of abstractions. :) – sfjac Apr 26 '14 at 16:41
  • 1
    @sfjac: Which is less suprising: Passing an `istream_iterator` to copy N and having it remove N+1 items from the input (losing one permenantly), or passing an `istreambuf_iterator` iterator and having it leave an extra on the buffer to be removed manually? While both are not quite right, one results in loss of data. – Dave S Apr 26 '14 at 19:43
  • @DaveS: I'd say they're both bugs. – sfjac Apr 26 '14 at 20:58
8

n3797 [algorithms.general]/12

In the description of the algorithms operators + and - are used for some of the iterator categories for which they do not have to be defined. In these cases the semantics of a+n is the same as that of

X tmp = a;
advance(tmp, n);
return tmp;

and that of b-a is the same as of

return distance(a, b);

[alg.modifying.operations]

template<class InputIterator, class Size, class OutputIterator>
OutputIterator copy_n(InputIterator first, Size n,
                      OutputIterator result);

5 Effects: For each non-negative integer i < n, performs *(result + i) = *(first + i).

6 Returns: result + n.

7 Complexity: Exactly n assignments.


I'm not sure this is well-formed for InputIterators (no multipass), since it does not modify the original iterator, but always advances a copy of the original iterator. It doesn't seem to be efficient either.

[input.iterators]/Table 107 - Input iterator requirements (in addition to Iterator)

Expression: ++r
Return type: X&
pre: r is dereferencable.
post: r is dereferenceable or r is past-the-end.
post: any copies of the previous value of r are no longer required either to be dereferenceable or to be in the domain of ==.

As far as I can see, the a in

X tmp = a;
advance(tmp, n);
return tmp;

is therefore no longer required to be incrementable.


Related defect report: LWG 2173

dyp
  • 38,334
  • 13
  • 112
  • 177
  • I agree this is weird and should be a library defect. Several other algorithm requirements have the same problem. – aschepler Apr 25 '14 at 19:37
  • 1
    While I fully agree with the observation, I fail to see how this relates to the question. It's true that `std::copy_n` invalidates the input iterator (and any backup copy you might have too). It appears to be a black hole. But the fact is that the calls of `operator++` should be considered observable behavior (you may pass your own). – MSalters Apr 25 '14 at 20:21
  • 1
    @MSalters Not sure what your criticism is. I think the spec of `copy_n` is broken relating to calls to `operator++`. So whether it should be N or N-1 calls cannot really be answered since no implementer probably used the spec literally. If they did, for `i == n-1`, `first` would be incremented `n-1` times, and there's no increment after that. Therefore I *suspect* the correct answer is *N-1*, but that depends on how the spec is fixed. – dyp Apr 25 '14 at 20:28
  • gcc changed its mind about copy_n at one point: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50119 – Cubbi Apr 25 '14 at 21:23
3

The source iterator is not taken by reference. So, a copy of it is incremented n times, but the parameter is left untouched.

9 out of 10 times, this is what you want.

As far as the side-effects of incrementing specifically on InputIterators is concerned I think officially, input iterators shall be "incremented" upon each read (repeated read without increment does not yield the same value). So, just make the increment a no-op.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • 2
    While that's true, the `istreambuf_iterator` type has shallow copy semantics. The question is not "why is the iterator not updated?" as much as "why is the iterator updated n - 1 times and not n times?" – templatetypedef Apr 25 '14 at 19:14
  • @templatetypedef does the second half of my answer make sense to you? (I'll see whether I can find a reasonably authoritative source on that, but conceptually it makes a bit of sense). And I don't think InputIterators have "shallow copy semantics". They have non-deterministic results, which is exactly what you'd expect with the concept of an InputIterator. (The iterator is by value, but ["you can't step into the same stream a second time"](http://en.wikiquote.org/wiki/Heraclitus)) – sehe Apr 25 '14 at 19:15
  • @sehe: You can't use a copy of an old position of an Input Iterator in general, but `std::istreambuf_iterator` is well-defined as allowing copies to point at the same buffer, doing `sbumpc` on increment and doing `sgetc` on dereference. – aschepler Apr 25 '14 at 19:23
  • I don't think that `(repeated read without increment does not yield the same value)` is the case for any standard specified input iterator. Even `istream_iterator` will return the same value repeatedly until you increment. – Dave S Apr 25 '14 at 20:31
  • 1
    @DaveS The iterator is not _required_ to return different values, but it is _allowed_ to for InputIterator. The stronger concept is ForwardIterator (which explicit allows re-reading an iterator, even when a copy has been advanced). – sehe Apr 25 '14 at 20:34