85

Consider this topic a sequel of the following topic:

Previous installment
Undefined behavior and sequence points

Let's revisit this funny and convoluted expression (the italicized phrases are taken from the above topic *smile* ):

i += ++i;

We say this invokes undefined-behavior. I presume that when say this, we implicitly assume that type of i is one of the built-in types.

What if the type of i is a user-defined type? Say its type is Index which is defined later in this post (see below). Would it still invoke undefined-behavior?

If yes, why? Is it not equivalent to writing i.operator+=(i.operator++()); or even syntactically simpler i.add(i.inc());? Or, do they too invoke undefined-behavior?

If no, why not? After all, the object i gets modified twice between consecutive sequence points. Please recall the rule of thumb: an expression can modify an object's value only once between consecutive "sequence points. And if i += ++i is an expression, then it must invoke undefined-behavior. If so, then its equivalents i.operator+=(i.operator++()); and i.add(i.inc()); must also invoke undefined-behavior which seems to be untrue! (as far as I understand)

Or, i += ++i is not an expression to begin with? If so, then what is it and what is the definition of expression?

If it's an expression, and at the same time, its behavior is also well-defined, then it implies that the number of sequence points associated with an expression somehow depends on the type of operands involved in the expression. Am I correct (even partly)?


By the way, how about this expression?

//Consider two cases:
//1. If a is an array of a built-in type
//2. If a is user-defined type which overloads the subscript operator!

a[++i] = i; //Taken from the previous topic. But here type of `i` is Index.

You must consider this too in your response (if you know its behavior for sure). :-)


Is

++++++i;

well-defined in C++03? After all, this is this,

((i.operator++()).operator++()).operator++();

class Index
{
    int state;

    public:
        Index(int s) : state(s) {}
        Index& operator++()
        {
            state++;
            return *this;
        }
        Index& operator+=(const Index & index)
        {
            state+= index.state;
            return *this;
        }
        operator int()
        {
            return state;
        }
        Index & add(const Index & index)
        {
            state += index.state;
            return *this;
        }
        Index & inc()
        {
            state++;
            return *this;
        }
};
Community
  • 1
  • 1
Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • 13
    +1 great question, which inspired great answers. I feel I ought to say that it's still horrible code which should be refactored to be more readable, but you probably know that anyway :) – Philip Potter Jan 09 '11 at 08:53
  • since when s++ is same as ++s? –  Jan 09 '11 at 08:55
  • 4
    @What is the Question: who said it's same? or who said it's not same? Does it not depend on how you implement them? (Note: I'm assuming type of `s` is user-defined type!) – Nawaz Jan 09 '11 at 08:59
  • 5
    I don't see any *scalar* object being modified twice between two sequence points... – Johannes Schaub - litb Jan 09 '11 at 09:06
  • @Nawaz: right... just as i thought... –  Jan 09 '11 at 09:07
  • 3
    @Johannes : then it's about *scalar* object. What is it? I wonder why I never heard of it before. Maybe, because the tutorials/C++-faq do not mention it, or do not emphasis it? Is it different from objects of *built-in* type? – Nawaz Jan 09 '11 at 09:14
  • 3
    @Phillip : Obviously, I'm not going to write such code in real life; in fact, no sane programmer is going to write it. These questions are usually devised so that we can understand the whole business of undefined-behavior and sequence points better! :-) – Nawaz Jan 09 '11 at 09:29
  • @Nawaz my faq entry covers it. It says "really, it applies to scalar objects, because other objects are either non-modifiable (arrays) or just aren't applicable to this rule (class objects)". – Johannes Schaub - litb May 13 '11 at 11:20
  • @Johannes: Yeah I saw that this morning. :-) – Nawaz May 13 '11 at 11:34
  • 2
    In C++11 `i += ++i` becomes well defined even for built in types. http://stackoverflow.com/q/10655290/365496 – bames53 May 18 '12 at 17:42
  • 1
    @Nawaz: very good question. never thought about user defined types. – Destructor Aug 28 '15 at 11:28

5 Answers5

48

It looks like the code

i.operator+=(i.operator ++());

Works perfectly fine with regards to sequence points. Section 1.9.17 of the C++ ISO standard says this about sequence points and function evaluation:

When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all function arguments (if any) which takes place before execution of any expressions or statements in the function body. There is also a sequence point after the copying of a returned value and before the execution of any expressions outside the function.

This would indicate, for example, that the i.operator ++() as the parameter to operator += has a sequence point after its evaluation. In short, because overloaded operators are functions, the normal sequencing rules apply.

Great question, by the way! I really like how you're forcing me to understand all the nuances of a language that I already thought I knew (and thought that I thought that I knew). :-)

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
12

http://www.eelis.net/C++/analogliterals.xhtml Analog literals comes to my mind

  unsigned int c = ( o-----o
                     |     !
                     !     !
                     !     !
                     o-----o ).area;

  assert( c == (I-----I) * (I-------I) );

  assert( ( o-----o
            |     !
            !     !
            !     !
            !     !
            o-----o ).area == ( o---------o
                                |         !
                                !         !
                                o---------o ).area );
11

As others have said, your i += ++i example works with the user-defined type since you're calling functions, and functions comprise sequence points.

On the other hand, a[++i] = i is not so lucky assuming that a is your basic array type, or even a user defined one. The problem you've got here is that we don't know which part of the expression containing i is evaluated first. It could be that ++i is evaluated, passed off to operator[] (or the raw version) in order to retrieve the object there, and then the value of i gets passed to that (which is after i was incremented). On the other hand, perhaps the latter side is evaluated first, stored for later assignment, and then the ++i part is evaluated.

4xy
  • 3,494
  • 2
  • 20
  • 35
Edward Strange
  • 40,307
  • 7
  • 73
  • 125
  • so... is the result therefore unspecified rather than UB, since the order of evaluation of expressions is unspecified? – Philip Potter Jan 09 '11 at 10:26
  • @Philip: unspecified means that we expect the compiler to specify the behavior, whereas undefined places no such obligation. I think it is undefined here, to let compilers more room for optimizations. – Matthieu M. Jan 09 '11 at 11:25
  • @Noah : I also posted a response. Please check it out, and let me know your thoughts. :-) – Nawaz Jan 09 '11 at 11:36
  • @Matthieu: no, implementation-defined requires the compiler to specify a behaviour. unspecified requires the compiler to choose something, but it doesn't have to document it. it can be different each time if it likes. – Philip Potter Jan 09 '11 at 11:58
  • 1
    @Philip: the result is UB, because of the rule in 5/4: "The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined.". If all of the allowable orderings had sequence points between the modification `++i`, and the reading of `i` on the RHS of the assignment, then the order would be unspecified. Because one of the allowable orderings does those two things with no intervening sequence point, behavior is undefined. – Steve Jessop Jan 12 '11 at 01:46
  • ... you can look at this as basically just saying the perfectly sensible thing, "if it is unspecified whether behavior is defined or undefined, then behavior is undefined". Because as you say, unspecified behavior can be different each time from the possibilities set out by the standard. And the possibilities set out by the standard included some orderings with defined behaviour, and some orderings with undefined behavior. So the compiler is, as it were, permitted to always "choose" the UB. Noah is just describing a couple of plausible implementation details, not all legal possibilities. – Steve Jessop Jan 12 '11 at 01:50
  • @Steve: first, i think that just defines unspecified behaviour as undefined behaviour, which it is not. Under UB, *anything* may happen, while under unspecified behaviour, anything from a restricted set may happen. second, is it still UB for user-defined `i`? The function calls introduced by `operator++` force sequence points between reading and writing, whether the LHS `i++` is evaluated before or after the RHS `i`. – Philip Potter Jan 12 '11 at 07:47
  • 1
    @Philip: It doesn't just defined unspecified behavior as undefined behavior. Again, *if* the range of unspecified behavior includes some which is undefined, *then* the overall behavior is undefined. *If* the range of unspecified behavior is defined in all possibilities, *then* overall behavior is unspecified. But you're right on the second point, I was thinking of a user-defined `a` and builtin `i`. – Steve Jessop Jan 12 '11 at 11:21
  • @Steve Jessop: Would it be fair to think an an expression like "foo++" as saying "as soon as other code is no longer allowed to assume 'foo' is unchanged, attach a demon to 'foo's memory cells which will escape and wreak havoc if unsuspecting code tries to access them; just before other code is allowed to access foo, remove the demon the load the memory cells with the proper value"? – supercat Mar 21 '11 at 21:04
  • @supercat: sounds about right, if your `foo++` is a sub-expression of a full-expression. Then, "as soon as other code is no longer allowed" is the previous sequence point, and "just before other code is allowed" is the next sequence point. You can probably think of `foo` as being possessed during that "time", with that sub-expression being the only part of the expression allowed to access it. – Steve Jessop Mar 21 '11 at 21:51
  • @Steve Jessop: Right. My point, which many people fail to realize, is that not only does one have no guarantee whether the variable has been written, there's no guarantee that attempting to read or write it won't make very bad things happen. Actually, some caching architectures involve something a bit like "checking out" shared memory areas and later checking them back in. If hardware imposes locking, trying to access things at incorrect times could result in deadlock. – supercat Mar 21 '11 at 22:45
  • I understand why `a[++i] = i` is not OK. How about `a[(++i)] = i`? – gsamaras Jul 01 '15 at 11:47
  • The type of `a` is irrelevant for whether this is UB or not -- what matters is the type of `i`. If `i` is a builtin type, then it is UB, while if `i` is a user-defined type it will unspecified behavior. – Chris Dodd Oct 22 '17 at 20:58
8

I think it's well-defined:

From the C++ draft standard (n1905) §1.9/16:

"There is also a sequence point after the copying of a returned value and before the execution of any expressions outside the function13) . Several contexts in C++ cause evaluation of a function call, even though no corresponding function call syntax appears in the translation unit. [ Example: evaluation of a new expression invokes one or more allocation and constructor functions; see 5.3.4. For another example, invocation of a conversion function (12.3.2) can arise in contexts in which no function call syntax appears. — end example ] The sequence points at function-entry and function-exit (as described above) are features of the function calls as evaluated, whatever the syntax of the expression that calls the function might be. "

Note the part I bolded. This means there is indeed a sequence point after the increment function call (i.operator ++()) but before the compound assignment call (i.operator+=).

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
6

Alright. After going through previous responses, I re-thought about my own question, particularly this part that only Noah attempted to answer but I'm not convinced with him completely.

a[++i] = i;

Case 1:

If a is an array of built-in type. Then what Noah said is correct. That is,

a[++i] = i is not so lucky assuming that a is your basic array type, or even a user defined one . The problem you've got here is that we don't know which part of the expression containing i is evaluated first.

So a[++i]=i invokes undefined-behavior, or the result is unspecified. Whatever it is, it's not well-defined!

PS: In above quotation, strike-through is of course mine.

Case 2:

If a is an object of user-defined type which overloads the operator[], then again there are two cases.

  1. If the return type of overloaded operator[] function is built-in type, then again a[++i]=i invokes undefined-behavior or the result is unspecified.
  2. But if the return type of overloaded operator[] function is user-defined type, then the behavior of a[++i] = i is well-defined (as far as I understand), since in this case a[++i]=i is equivalent to writing a.operator[](++i).operator=(i); which is same as, a[++i].operator=(i);. That is, assignment operator= gets invoked on the returned object of a[++i], which seems be very well-defined, since by the time a[++i] returns, ++i have already been evaluated, and then the returned object calls operator= function passing the updated value of i to it as argument. Note that there is a sequence point between these two calls. And the syntax ensures that there is no competition between these two calls, and operator[] would get invoked first, and consecutively, the argument ++i passed into it, would also get evaluated first.

Think of this as someInstance.Fun(++k).Gun(10).Sun(k).Tun(); in which each consecutive function call returns an object of some user-defined type. To me, this situation seems more like this: eat(++k);drink(10);sleep(k), because in both situations, there exists sequence point after each function call.

Please correct me if I'm wrong. :-)

Community
  • 1
  • 1
Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • you missed the point. The problem is that it's not specified whether the `i` expression or the `++i` expression gets evaluated first; *whether i or a are of user-defined type or not*. – Philip Potter Jan 09 '11 at 10:47
  • @Philip : I added some explanation to my post. Please read the last few lines! – Nawaz Jan 09 '11 at 10:52
  • @Nawaz: in `someInstance.Fun(++k).Gun(10).Sun(k).Tun();` it is possible that the `k` within `Sun(k)` is evaluated before the `++k` in `Fun(++k)`. Once this is done, the compiler will need to keep the result somewhere, because `Fun()` *must* be evaluated before `Sun()`, because `Sun()` depends on `Fun()`'s output. This dependency does *not* extend to the arguments of `Sun()` and `Fun()`, which is where your problem lies. – Philip Potter Jan 09 '11 at 12:33
  • @Philip Potter : *it is possible that the k within Sun(k) is evaluated before the ++k in Fun(++k)*... Why is that? Any reference from the langauge specification? – Nawaz Jan 09 '11 at 15:02
  • @Nawaz sadly I don't have a copy of the C++ standard. But [this question](http://stackoverflow.com/questions/621542/compilers-and-argument-order-of-evaluation-in-c) covers similar issues. – Philip Potter Jan 09 '11 at 17:54
  • @Philip : that issue is different from this. in that the two expressions `i++` and `i++` are not separated by sequence points. `i` gets modified twice between consecutive sequence points; but here first of all, there is sequence point after **each** function call; second, since the expression `++i` appears in the first function call, and after that there is sequence point, that means between expressions `++i` and `i` there is sequence points as well. – Nawaz Jan 09 '11 at 18:08
  • 1
    @Nawaz `k++` and `k` are *not* separated by sequence points. They can both be evaluated before either `Sun` or `Fun` are evaluated. The language *only* requires that `Fun` is evaluated before `Sun`, not that `Fun`'s arguments are evaluated before `Sun`'s arguments. I'm kind of explaining the same thing again without being able to provide a reference, so we're not going to progress from here. – Philip Potter Jan 09 '11 at 18:12
  • @Philip : *k++ and k are not separated by sequence points*... how? just because them `;` is not there? – Nawaz Jan 09 '11 at 18:14
  • @Philip : how this situation is any different from `eat(i++);drink(10);sleep(i);`? – Nawaz Jan 09 '11 at 18:16
  • 1
    @Nawaz: because there is nothing that defines a sequence point separating them. There are sequence points before and after `Sun` executes, but `Fun`'s argument `++k` may be evaluated before or after that. There are sequence points before and after `Fun` executes, but `Sun`'s argument `k` may be evaluated before or after that. Therefore, one possible case is that both `k` and `++k` are evaluated before either `Sun` or `Fun` are evaluated, and so both are before the function-call sequence points, and so there is no sequence point separating `k` and `++k`. – Philip Potter Jan 09 '11 at 18:17
  • 1
    @Philip : I repeat : how this situation is any different from `eat(i++);drink(10);sleep(i);`? ... even now, you could say `i++` may be evaluated before or after that? – Nawaz Jan 09 '11 at 18:23
  • @Nawaz because in that situation, the `;` is an explicit sequence point between `i++` and `i`. >_< I need a whiteboard and a face-to-face to make this clearer. – Philip Potter Jan 09 '11 at 18:30
  • @Philip : how does it matter? as long as there is a sequence point **between** these two expressions, it seems fine to me. I don't see any difference! – Nawaz Jan 09 '11 at 18:34
  • 1
    @Nawaz: how can i make myself more clear? In the Fun/Sun example, there is **no** sequence point between `k` and `++k`. In the eat/drink example, there **is** as sequence point between `i` and `i++`. – Philip Potter Jan 09 '11 at 18:40
  • 3
    @Philip: that doesn't make sense at all. Between Fun() and Sun() exists a sequence point, but between their argument doesn't exist sequence points. Its like saying, between `eat()` and `sleep()` exists sequence point(s), but between there arguments doesn't even one. How can arguments to two function calls separated by sequence points, belong to the *same* sequence points? – Nawaz Jan 09 '11 at 19:44
  • @Nawaz: I clearly am failing at explaining it to you in this short space. perhaps you can ask a question of the whole SO crowd. – Philip Potter Jan 09 '11 at 20:42
  • 1
    @Philip : Please join this topic here : http://stackoverflow.com/questions/4709727/is-this-code-well-defined – Nawaz Jan 17 '11 at 03:32
  • @Nawaz: just for completness, you could review also a case of array of user-defined type. I think it's the same as your case 2.1. – Andriy Tylychko Sep 24 '11 at 19:34