Using an iterator to Divide an Array into Parts with Unequal Size

Question

I have an array which I need to divide up into 3-element sub-arrays. I wanted to do this with iterators, but I end up iterating past the end of the array and segfaulting even though I don't dereference the iterator. given: auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; I'm doing:

auto bar = cbegin(foo);

for (auto it = next(bar, 3); it < foo.end(); bar = it, it = next(bar, 3)) {
    for_each(bar, it, [](const auto& i) { cout << i << endl; });
}

for_each(bar, cend(foo), [](const auto& i) { cout << i << endl; });

Now I can solve this by defining a finish iterator:

auto bar = cbegin(foo);
auto finish = next(cend(foo), -(size(foo) % 3));

for (auto it = next(bar, 3); it != finish; bar = it, it = next(bar, 3)) {
    for_each(bar, it, [](const auto& i) { cout << i << endl; });
}

for_each(bar, finish, [](const auto& i) { cout << i << endl; });
for_each(finish, cend(foo), [](const auto& i) { cout << i << endl; });

But this seems unnecessary when I don't dereference the iterator. Why can't I do the first version?

that's what the standard says: you cannot obtain an iterator outside [begin, end]. Additionally, you cannot dereference the end iterator. This rule is an extension of pointers: you cannot obtain a pointer that doesn't point to an object or array or 1 past the last element of an array. — bolov, Apr 05 '16 at 11:46
@bolov Do you have a source for that? I mean it's just a number in an `int` till I dereference it, right? — Jonathan Mee, Apr 05 '16 at 11:49
You algorithm seems to depend on a random access iterator, you might use an index (size_r) and operator [] instead. — , Apr 05 '16 at 11:50
@DieterLücking I actually had also written up an index version, which seems more hopeful. In that at least it isn't segfaulting :( — Jonathan Mee, Apr 05 '16 at 11:52
I'm sure someone will come and add standard quotes. As for reasons: C++ is thought to be as generic as possible, it has to work on all sorts of crazy unthinkable architectures. The ideea is that you have to ask yourself what happens for instance when your array is near the end of the memory addressable space getting `last + 10` would not be pointing to an invalid memory, but would not make sense, as let's say `last + 5` is the last memory address. So the standard says it's **undefined behavior** — bolov, Apr 05 '16 at 11:53
@bolov You make a good point about exceeding the size of an `int`. But we know that C++ has to be able to return an `end` pointer which means that it cannot allocate space in the very last line of accessible memory. At a maximum it could only represent the allocate on the *next* to last line. Meaning that as long as my modulo number is smaller than `sizeof(int*) * 8` there shouldn't be any undefined behavior. — Jonathan Mee, Apr 05 '16 at 12:08
@JonathanMee I don't follow you. It is required that an iterator/pointer to one past the last element of an array must be valid. Past `last` there is no guarantee, Obtaining one past `last` is undefined behavior. And I think you confuse things a bit. I don't see any relation to `sizeof(int*) * 8`. I say again: `last` can hypothetically be a pointer to the last addressable memory for an object type stored by the array. Going past `last` is undefined behavior. — bolov, Apr 05 '16 at 12:33
Also I was careful not talking about `int`. There are `pointers` and `iterators` and then there is `int`. The standard only guarantees that a pointer can be converted to an `intptr_t` and then back again to a pointer and you get the original pointer value. It doesn't even make guarantees about the `intptr_t` value. — bolov, Apr 05 '16 at 12:35
@bolov I mean we are talking about an `int` here (or a `long` to be precise.) Pointers are simply a memory index that boils down to a unique number, hence we can compare `int*`s just like `int`s, and that's what I'm doing here. — Jonathan Mee, Apr 05 '16 at 12:38
@JonathanMee You would think so. I did to. But no. We are not talking about integers. We are talking about pointers. And even though on most architectures on most compilers pointers are implemented via an integer type, the standard doesn't make this connection. For instance. If you have `int a; int b` it is undefined behavior to compare the pointers to `a` and `b`, i.e. `&a < &b` is undefined behavior, even if a `int *` is just an integer underneath. There are rules for integers and then there are rules for pointers. For better or worse that's how C++ was designed. — bolov, Apr 05 '16 at 12:44
From § 5.9 of the C++11 standard: "If two pointers p and q of the same type point to different objects that **are not members of the same object or elements of the same array** or to different functions, or if only one of them is null, the results of pq, p<=q, and p>=q are unspecified.". My bad, it's not undefined, it's unspecified. — bolov, Apr 05 '16 at 12:53

score 2 · Answer 1 · edited May 23 '17 at 11:52

2

The segfault you are seeing is coming from next checking the range for you is an assertion in your Debug implementation to check against undefined behavior. The behavior of iterators and pointers is not defined beyond the their allocated range, and the "one past-the-end" element: Are iterators past the "one past-the-end" iterator undefined behavior?

This means that incrementing past the "one past-the-end" element is undefined behavior independent of the iterator's subsequent use. In order to have defined behavior you must use a solution like your Integer Modulo algorithm or similar, but you will have to change auto it = next(bar, 3) to something that conditionalizes based on the availability of at least the size of your sub-array, so something like: auto it = size(foo) <= 3 ? finish : next(bar, 3).

Where available the best solution here is going to cause the least redundant iteration is to track the size remaining in the container as an integer which does not suffer from undefined behavior when it falls outside the range and "one past-the-end". This can be accomplished by:

auto bar = cbegin(foo);

for (auto i = size(foo); i > STEP; i -= STEP) {
    for(auto j = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
    cout << endl;
}

for(auto i = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
cout << endl;

EDIT: I had previously suggested using pointers which are not Debug conditioned, this is undefined behavior.

The problem is that next is checking the range for you. We use pointers outside of allocated memory all the time, for example nullptr and end, and that's all it here is. If you just use C-style pointer arithmetic here you'll be fine:

auto bar = cbegin(foo);

for (auto it = bar + 3; it < cend(foo); bar = it, it = bar + 3) {
    for_each(bar, it, [](const auto& i) { cout << i << endl; });
}

for_each(bar, cend(foo), [](const auto& i) { cout << '\t' << i << endl; });

Live Example

Alternatively, if you run in Release configuration the range checks should be removed, so you will be able to use the first version of your code.

edited May 23 '17 at 11:52

Community

1
1

answered Apr 05 '16 at 11:58

Jonathan Mee

37,899
23
129
288

This still may invoke UB. from [iterator.requirements](5): [...]Iterators can also have singular values that are not associated with any sequence.[...] Results of most expressions are undefined for singular values[...] – NathanOliver Apr 05 '16 at 12:07
@NathanOliver "Iterators can also have singular values that are not associated with any sequence." Just as `end` is associated with this sequence `it` is associated with this sequence, they're just both, past the end `iterator`s, which are not dereferenceable, but I don't think that either qualifies as a "Singular Value". – Jonathan Mee Apr 05 '16 at 12:31
I am not sure. I do not know if 2 past then end is less valid then 1 past the end. there is: *Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values.* – NathanOliver Apr 05 '16 at 12:34
@NathatOliver Yeah that's what I was referencing here: http://stackoverflow.com/questions/36425393/using-an-iterator-to-divide-an-array-into-parts-with-unequal-size/36425799?noredirect=1#comment60466113_36425393 But you're point is well made, depending on the size of my partition `bar + #` may overflow the integral value in which case this wouldn't work. If you want to type that up, I'll accept, otherwise, I'll just correct my answer. – Jonathan Mee Apr 05 '16 at 12:45
The problem isn't that the debug configuration does a range check. The problem is that you have undefined behavior. – bolov Apr 05 '16 at 12:47
1

This has nothing to do with overflowing. If `last` is one past then end of the array then is `last + 1` still considered a past-the-end iterator or is it a singular iterator and UB? I am still searching for that. – NathanOliver Apr 05 '16 at 12:49
1

I found it. see [this](http://stackoverflow.com/a/33675281/4342498) answer. Not sure if this actually qualifies as a dupe or not. – NathanOliver Apr 05 '16 at 12:54
@NathanOliver Excellent find sir. It's annoying that it calls out a `string`. More annoying that the answer directly references `string` but it seems very relevant... – Jonathan Mee Apr 05 '16 at 12:57
1

@JonathanMee No problem. As you can see from my comment I am not 100% sure if the answer is correct or not. 8 other people agreed with it but it is not accepted so there is that. I am still trying to find something in the standard. – NathanOliver Apr 05 '16 at 13:07
@NathanOliver I don't think that you'll find anything in the standard. I think that arithmetic on pointers is always legal, it's dereferencing them that's only legal within an allocated range. Here's the thing though, I don't know where in memory this is allocated. If it's at the upper bound of what I can index, and my increment pushes the index past that upper bound I think that's when it would be a problem. – Jonathan Mee Apr 05 '16 at 13:53
@bolov I have come around to your way of thinking and updated the answer. Thanks for holding me to the standard. – Jonathan Mee May 16 '16 at 12:30
@NathanOliver I finally got standard requirement we were looking for. I've linked it and updated my answer. – Jonathan Mee May 16 '16 at 12:31

score 1 · Accepted Answer · edited May 23 '17 at 12:31

The reason this is prohibited is covered well at your other question Are iterators past the "one past-the-end" iterator undefined behavior? so I'll just address improved solutions.

For random-access iterators (which you must have if you are using <), there's no need whatsoever for the expensive modulo operation.

The salient points are that:

it + stride fails when it nears the end
end() - stride fails if the container contains too few elements
end() - it is always legal

From there, it's simple algebraic manipulation to change it + stride < end() into a legal form (subtract it from both sides).

The final result, which I have used many times:

for( auto it = c.cbegin(), end = c.cend(); end - it >= stride; it += stride )

The compiler is free to optimize that back to comparison to a precomputed end - stride * sizeof(*it) if the memory model is flat -- the limitations of C++ behavior don't apply to the primitive operations which the compiler translates C++ into.

You may of course use std::distance(it, end) if you prefer to use the named functions instead of operators, but that will only be efficient for random-access iterators.

For use with forward iterators, you should use something that combines the increment and termination conditions like

struct less_preferred { size_t value; less_preferred(size_t v) : value(v){} };

template<typename Iterator>
bool try_advance( Iterator& it, less_preferred step, Iterator end )
{
     while (step.value--) {
         if (it == end) return false;
         ++it;
     }
     return true;
}

With this additional overload, you'll get efficient behavior for random-access iterators:

template<typename RandomIterator>
auto try_advance( RandomIterator& it, size_t stride, RandomIterator end )
     -> decltype(end - it < stride) // SFINAE
{
     if (end - it < stride) return false;
     it += stride;
     return true;
}

OK, so you're saying that one integer modulo is going to be more expensive than `c.size() / stride` subtractions? Sounds fishy. — Jonathan Mee, May 17 '16 at 16:42
@JonathanMee: At runtime the `it != final` is going to be executed as `it - final != 0` anyway; I expect the optimizer to make `end - it >= stride` just as efficient. — Ben Voigt, May 17 '16 at 20:24
A couple comments: 1) `try_advance` can't compile, even after I edited away the bugs, because random access iterators match both specializations 2) Using "-O3" on gcc `it != final` generates only a comparison, *not* a subtraction, while `end - it >= stride` does incur a subtraction each call which I expect makes it more expensive. I've tried to shrink the supporting data into a comment, but I can't even, so I've added it as a new answer: http://stackoverflow.com/a/37299761/2642059 I'm not sure how relevant this is to the question so I'll probably just delete it after we finish discussing. — Jonathan Mee, May 18 '16 at 12:46

score 0 · Answer 3 · edited May 23 '17 at 12:31

There is some disagreement about the most effective way to accomplish this iteration through array partitions.

First the one time integer modulo method, this must define auto size in addition to the changes in my answer because gcc does not yet support size:

auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };  
auto size = distance(cbegin(foo), cend(foo));
auto bar = cbegin(foo);
auto finish = prev(cend(foo), size % 3);

for(auto it = size <= 3 ? cend(foo) : next(bar, 3); it != finish; bar = it, it = next(bar, 3)) {
    for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

for_each(bar, finish, [](const auto& i) { cout << i << '\t'; });
cout << endl;
for_each(finish, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;

This creates 112 lines of assembly, most notably the conditional it != finish generates these instructions:

cmpq    %r12, %r13
je      .L19
movq    %r12, %rbx
jmp     .L10

Second the repeated iterator subtraction using Ben Voigt's try_advance but only with the random access specialization because there is a compiler conflict for random access iterators:

auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };  
auto bar = cbegin(foo);

for (auto it = cbegin(foo), end = cend(foo); try_advance(it, 3, end); bar = it) {
    for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;

This creates 119 lines of assembly, most notably the conditional in try_advance: if (end - it < stride) return false; incurs a per iteration generating the code:

movq    %r12, %rax
subq    %rbp, %rax
cmpq    $11, %rax
ja      .L3

Upon learning that cmpq is really just a subtract and compare operation I have written some bench-marking code: http://coliru.stacked-crooked.com/a/ad869f69c8dbd96f I needed to use Coliru to be able to turn on optimization, but it keeps giving me bogus increments of my test count for times, I'm not sure what's going on there. ~~What I can say is locally, the repeated iterator subtraction is always faster, sometimes significantly so. Upon learning this I believe that Ben Voigt's answer should be marked as the correct one.~~

EDIT:

I've made an interesting discovery. It's the algorithm that goes first that always looses. I've rewriten the code to swap the first algorithm on each pass. When this is done the integer modulo method always beats the iterator subtraction method as would be suspected by looking at the assembly, again something fishy is going on with Coliru, but you can take this code and run it locally: http://coliru.stacked-crooked.com/a/eb3e0c70cc138ecf

The next issue is that both of these algorithms are lazy; in the event that size(foo) is a multiple of 3 they allocate an empty vector at the end of the vector. That requires significant branching for the integer modulo algorithm to remedy, but only the simplest of changes for the repeated iterator subtraction algorithm. The resulting algorithms exhibit effectively equal benchmark numbers but the edge goes to the repeated iterator subtraction for simplicity:

Integer modulo algorithm:

auto bar = cbegin(foo);
const auto size = distance(bar, cend(foo));

if (size <= 3) {
    for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}
else {
    auto finish = prev(cend(testValues), (size - 1) % 3 + 1);

    for (auto it = next(bar, 3); it != finish; bar = it, advance(it, 3)) {
        for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
        cout << endl;
    }

    for_each(bar, finish, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
    for_each(finish, cend(foo), [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

Repeated iterator subtraction algorithm:

auto bar = cbegin(foo);

for (auto it = cbegin(foo); distance(it, cend(foo)) > 3; bar = it) {
    advance(it, 3);
    for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;

EDIT: Throwing the Remaining Size Algorithm into the hat

Both the Integer Modulo and Repeated Subtraction Algorithms above suffer from iterating over the input sequence more than once, other than being slower this isn't that serious because currently we're using a Bidirectional Iterator, but should our input iterator fail to qualify for Bidirectional Iterator this would be excessively expensive. Independent of iterator type the Remaining Size Algorithm beats all challengers every time at 10,000,000+ testbench iterations:

auto bar = cbegin(foo);

for (auto i = size(foo); i > STEP; i -= STEP) {
    for(auto j = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
    cout << endl;
}

for(auto i = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
cout << endl;

I've again copied my local testing to Coliru, which gives weird results but you can verify locally: http://coliru.stacked-crooked.com/a/361f238216cdbace

Btw cmp instruction is "subtract and set flags (discard result)". What optimization level is this? — Ben Voigt, May 18 '16 at 13:18
@BenVoigt Thanks for the info. I've added some bench-marking code, and you can imagine my surprise that even though your code always compiles to larger assembly code, it is faster, consistently. (Though I have no idea what's up with Coliru?) I want to accept your answer, but can you clean it up so that the `try_advance` doesn't fail to compile if I include both specializations? — Jonathan Mee, May 18 '16 at 13:40
@BenVoigt Nope, doesn't work. [It has been my experience](http://stackoverflow.com/q/37075331/2642059) that varadic arguments can only break ties if they are required by the call. You can't even use them to tie break against defaulted arguments. — Jonathan Mee, May 18 '16 at 17:18
Ok, fixed by having a user-defined conversion needed for the less favorable overload. — Ben Voigt, May 18 '16 at 17:47
@BenVoigt I need to be able to use `distance(it, cend(foo)) > 3` in order to prevent an extra allocation which makes this algorithm competitive with the integer modulo algorithm. But that makes the performance dreadful for forward iterators. Is there anything that you know of that can be done here? By the way, it makes an awful lot of sense to use iterator traits since we're using iterators. — Jonathan Mee, May 19 '16 at 10:33
@BenVoigt OK I have a new challenger, which seems to beat all comers [my answer has been updated](http://stackoverflow.com/a/36425799/2642059) with the Repeated Subtraction Algorithm. This algorithm also gets around the problems with Forward Iterators. Comments? — Jonathan Mee, May 24 '16 at 12:06
Your last coliru code (http://coliru.stacked-crooked.com/a/361f238216cdbace) seems to largely be a comparison of resize+emplace vs reserve + push_back, and both would appear to benefit from using `emplace_back` rather than `push_back`. — kfsone, Jun 10 '16 at 18:21

Using an iterator to Divide an Array into Parts with Unequal Size

3 Answers3

Linked