How does the C expression s[i ++] = t[j ++] exactly work in order?

Question

I'm new in C, stuck in understanding the expression s[i ++] = t[j ++], I don't know how it's possible that an array element gets accessed with a variable and then the variable increase itself and then the array element just accessed is again accessed with the original variable and then gets assigned to another array's element, I'm confused, I think to understand the exact process might involve some low-level knowledges, but I don't want to digress too far away, is there any way to understand it easily and clearly?

`"is again accessed with the original variable"` -- This statement seems false to me. Every variable is used only once. — Andreas Wenzel, Jul 27 '21 at 11:31
Are you confused by postfix increment itself, or what guarantees exist that `i++` is evaluated before `j++` (or vice versa; I don't remember the intricacies of C's evaluation rules myself)? — chepner, Jul 27 '21 at 11:31
It's the same as using i and j, then incrementing them in following instructions. — stark, Jul 27 '21 at 11:32
@stark, Only if `s`, `t`, `i` and `j` are not pointers to `i` or `j`. If you move the increment to following instructions, then the order of operations is defined. In the expression `s[i++] = t[j++]` it is not defined if the increments happens before or after `s[i_orig] = t[j_orig]`, where `i_orig` and `j_orig` are the values of `i` and `j` before increment. TLDR: Yes, you are right, unless someone desides to write stupid code. — HAL9000, Jul 27 '21 at 15:04
@HAL9000 In that case, you must also show that `s` isn't a macro that expands to `j++; array`, and on and on... — stark, Jul 27 '21 at 15:24

Serge Ballesta · Answer 1 · 2021-07-27T13:20:35.943

7

In C language, the expression i++ causes i to be incremented, but the expression i++ itself evaluates to the value i had before being incremented. So the expression s[i++] = t[j++] has the same behaviour as:

s[i] = t[j];
i = i + 1;
j = j + 1;

except that the precise order is not specified. For that last reason, the rule is that a variable should only be modified once: s[i++] = t[i++] would invoke Undefined Behaviour.

edited Jul 27 '21 at 13:20

answered Jul 27 '21 at 11:32

Serge Ballesta

143,923
11
122
252

There's no ordering - `i` might be increased earlier, so long as the old value of `i` is the result of evaluation. In this example there is no difference in observable behaviour, but there could be in other cases – M.M Jul 27 '21 at 11:35
What I meant is that `i` is increased **after being evaluated**. I know that all other side effects are unsequenced, the reason why I gave an example of UB. But I am afraid I will not be able to find more precise wordings because English is not my first language. – Serge Ballesta Jul 27 '21 at 11:39
My two cent: postfix and prefix operations are evaluated in each successive expressions in **one** statement. For example, `int x,y; (x=0, y=0, x++,y+=++x*2);` will zero x **then** y, **then** increment x, **then** increment x again and **then** multiply the _current_ x with 2 and **then** add it to _current_ y. Prefix operator are applied first before evaluating the expression, postfix operator are applied last after evaluating the expression. In between, operator's precedence applies. Here, at end you'll have x=2 and y = 4. – Zilog80 Jul 27 '21 at 13:20
@AndreasWenzel: Thanks for your comment. I have included your wording in my answer. – Serge Ballesta Jul 27 '21 at 13:21
1

@Zilog80, "each successive expression" is not well defined in the general case. It makes some sense in your particular example, with reference to the operands of the comma operators, but not, for example, with the subexpressions of your `y+=++x*2`. Moreover, the language separates the evaluation of expressions (to compute a result value) from the application of side effects (such as incrementing the value of `x`). That is why, for example, evaluating the expression `i++ + i++` produces undefined behavior. – John Bollinger Jul 27 '21 at 13:28
@JohnBollinger i use the [GNU C convention](https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Top) for _operator_, _expression_ and _statement_ . Do you think it would be more clear with `y+=((1+(x--))*2)` giving x =0 and y = 4 ? – Zilog80 Jul 27 '21 at 13:37
@Zilog80, it is not a question of code convention, and making operator grouping explicit via parentheses would change nothing about what I said. `(i++) + (i++)` also has undefined behavior. And so does `(++i) + (++i)`. – John Bollinger Jul 27 '21 at 13:40
@JohnBollinger I don't discard your comment, the goal here is to illustrate the actual C behavior regarding postfix and prefix operations. My question was about parenthesis making think clearer. – Zilog80 Jul 27 '21 at 13:44
1

@Zilog80 The issue is that when you have an expression like `i = i++ + i++`, you can't understand it, because it's undefined. You can't understand it by looking at "each successive expression". You can't understand it by using parentheses. When expressions like these come up in discussion (as they all too often do), and when people use words like "each successive expression" or "parentheses making things clearer" to try to explain things, it doesn't work. – Steve Summit Jul 27 '21 at 13:53
1

Yes, @Zilog80, and my point is that I find your comment misleading. Introducing the comma operator just muddies the waters, and some of your "and then" relationships just aren't reflective of the language's requirements. That doesn't actually make a difference for your example expression, but it could lead to dangerous misunderstandings about expressions such as the examples I presented. – John Bollinger Jul 27 '21 at 13:54
@JohnBollinger It seems there is some misunderstanding. My initial comment was about considering _expressions_ in C _statements_ (the **comma** case) and how C behave in that case. I don't see in what way it relates to the fact that i++ + i++ is undefined behavior. Did anything said in my comment would imply/negate that in any way ? – Zilog80 Jul 27 '21 at 14:00
@Zilog80, indeed, there does seem to be a misunderstanding. To be perfectly clear, then: I am saying that *your comment expresses an incorrect assertion* where it says "**then** increment x again and **then** multiply the current x with 2". It is wrong for the same reason that `i++ + i++` has undefined behavior, and therefore, although it correctly predicts the effect of the particular example presented, *it promotes a an incorrect view of C semantics that can lead to making and / or failing to recognize genuine errors*. – John Bollinger Jul 27 '21 at 14:07
1

@Zilog80 You correctly explained the behavior of the expression `x=0, y=0, x++,y+=++x*2`. However, the form of your explanation, if carelessly applied to the expression `i = i++ + i++`, could easily have led to a wrong conclusion. Basically, you got the right answer, but for a poor reason. And if someone remembered your reason, then later tried to reason from it to understand `i = i++ + i++`, they'd be apt to get the wrong answer. – Steve Summit Jul 27 '21 at 14:13
1

John and I are discouraging words like "postfix and prefix operations are evaluated in each successive expression" as a good, general way of understanding C expressions. See also [this answer](https://stackoverflow.com/questions/31087537/why-does-a-b-have-the-same-behavior-as-a-b/31088592#31088592). – Steve Summit Jul 27 '21 at 14:15
@SteveSummit I see, it's about phrasing. as `each successive expression` could be wrongly understood as `each successive "expressions" in the line y+=++x*2` instead of each expressions in`exp 1, exp 2, ...` . I'll look for a better phrasing. – Zilog80 Jul 27 '21 at 14:32
@Zilog80 The other issue is that the comma operator implies left-to-right evaluation and a sequence point, whereas most other operators, and the commas that separate function arguments, do not. So although your expression `x++,y+=++x*2` is well-defined and can be explained, the quite similar expressions `x++ + (y+=++x*2)` and `f(x++, y+=++x*2)` are not and cannot. (That is, they can't be explained using the words "postfix and prefix operations are evaluated in each successive expression", or indeed any other words other than the word "undefined".) – Steve Summit Jul 27 '21 at 14:37
@SteveSummit To sum the things up, i should not have use the `y+=++x*2` expression to illustrate the comma point, that's it ? `y+=x*2,++x` would had been less tricky ? – Zilog80 Jul 27 '21 at 14:42
@SteveSummit Maybe phrased like this : For **comma separated expressions** in a C statement, each expression is evaluated successively. For example, int x,y; (x=0, y=0, x++,y+=x*2,++x); will zero x then y, then increment x, then multiply the current x with 2 and then add it to current y, then increment x again. Prefix operator are applied first before evaluating the expression, postfix operator are applied last after evaluating the expression. In between, operator's precedence applies. Here, at end you'll have x=2 and y = 2. – Zilog80 Jul 27 '21 at 15:20
@Zilog80 Well, BrotherYao's question didn't mention the comma operator, nor did Serge Ballesta's answer, so I'm not sure how we got bogged down in a discussion of it. (To be fair, neither the question nor this answer mentioned the undefined expression `i = i++ + i++`, either.) – Steve Summit Jul 27 '21 at 15:58
@Serge Ballesta Oh, in fact I read this expression in chapter 2.8 of book "The C Programming Language. 2nd Edition" by Brian Kernighan and Dennis Ritchi, the authors created the C language, in the book they didn't notice that the expression will cause undefined behavior, so I think this is defined. – BrotherYao Jul 27 '21 at 19:38
@BrotherYao Yes, the expression `s[i++] = t[j++]` is well-defined. If it were `s[i++] = t[i++]`, though, that would be another story. Sorry for all the extra discussion in these comments and answers. There are two, quite separate questions: (1) "What does `s[i++] = t[j++]` mean?" (2) "What are the implications for undefined behavior of careless usage of the `++` operator in other, similar expressions?" You only asked about (1), but several of us have felt the need to also talk about (2). – Steve Summit Jul 29 '21 at 12:50

Steve Summit · Answer 2 · 2021-07-27T13:52:11.813

Like any other complicated-looking expression, it's easier to understand this if you break it down into parts.

The key is that innermost part (or "subexpression") i++. I assume you know what i++ does by itself, although in this example, we're hopefully going to get a deeper appreciation of what i++ is actually good for. Why would you want to "increment i, but return the old value"? What's the use of this? Well, the main use is that it's super useful for moving along an array.

Lets look at a simpler example. Suppose we have an array a that we want to store some numbers in. The most basic way is

int a[10];
a[0] = 12;
a[1] = 34;
a[2] = 5678;

Another very good way is to use a second variable like i to keep track of where we're storing:

i = 0;
a[i] = 12;
i = i + 1;
a[i] = 34;
i = i + 1;
a[i] = 5678;
i = i + 1;

I've written this out in "longhand", but of course in C, you would almost never write it this way, because the "C way" is the much more concise

i = 0;
a[i++] = 12;
a[i++] = 34;
a[i++] = 5678;

So first, make sure you understand that the "shorthand" and "longhand" forms work exactly the same way. Make sure you understand that when we say something like

a[i++] = 34;

what this means is "store 34 into the slot in array a indicated by i, and then update i to be one more than is used to be, so that it indicates the next slot."

In other words, we use an expression like a[i++] whenever we want to move along an array and do something with its elements, one by one, in order.

So far we were storing values into the array, but the idiom works just as well for fetching values out of an array. For example, this code prints those three elements, again one at a time, in order:

i = 0;
printf("%d\n", a[i++]);
printf("%d\n", a[i++]);
printf("%d\n", a[i++]);

My point is, again, that any time you see an expression like a[i++], you should think "we're moving along the array".

So now, finally, we can look at the expression you initially asked about:

s[i++] = t[j++];

Here we have two instances of the idiom. We're using i to move along the array s, and we're using j to move along the array t. We're fetching from t as we move along, and we're storing the values into s.

I don't know whether s and t are arrays of characters, or integers, or what. Also I don't know that s and t are truly arrays -- they might actually be pointers, pointing into some arrays. But I don't really have to know those things to know that the essential meaning of s[i++] = t[j++] is "copy elements from array t to array s, using j to keep where we are in t, and i to keep track of where we are in s".

[The above is an answer to your original question. The rest of this answer isn't directly related, but is essential to avoid inadvertently writing incorrect programs using ++ and --.]

As I said, the subexpression i++ and the idiom a[i++] are super useful for moving through arrays. But there are a couple things to beware of. (Actually it's just one thing, but it crops up in lots of different ways.)

Earlier I wrote the code

i = 0;
printf("%d\n", a[i++]);
printf("%d\n", a[i++]);
printf("%d\n", a[i++]);

to print the first three elements of the array a. But it prints them as bare, isolated numbers. What if I want to always see which array index each number comes from? That is, what if I'm tempted to write something like this:

i = 0;
printf("%d: %d\n", i, a[i++]);      /* WRONG */
printf("%d: %d\n", i, a[i++]);      /* WRONG */
printf("%d: %d\n", i, a[i++]);      /* WRONG */

If I wrote this, my intent would be that I would see the obvious display

0: 12
1: 34
2: 56788

But when I actually tried it just now, I got this instead:

1: 12
2: 34
3: 5678

The numbers 12 and 34 and 5678 are right, but the indices 1, 2, and 3 are all wrong -- they're off by one! How did that happen?

And the answer is that although i++ is, as I said, "super useful", it turns out that there's a fine line between "super useful" and what's called undefined behavior.

That printf call

printf("%d: %d\n", i, a[i++]);      /* WRONG */

looks fine, but it's not actually well-defined, because the compiler does not necessarily evaluate everything left-to-right, so it's not actually guaranteed that it will use the old value of i for the %d: part. The compiler might evaluate things from right to left, meaning that a[i++] will happen first, meaning that %d: will print the new value, instead -- which appears to be what happened when I tried it.

Here's another potential issue. Your original question was about

s[i++] = t[j++];

which, as we've seen, copies elements from t to s based on two possibly-different indices i and j. But what if we know we always want to copy t[1] to s[1], t[2] to s[2], t[3] to s[3], etc.? That is, what if we know that i and j will always be the same, so we don't even need separate i and j variables? How would we write that? Our first try might be

s[i++] = t[i++];                    /* WRONG */

but that can't be right, because now we're incrementing i twice, and we'll probably do something totally broken like copying t[1] to s[2] and t[3] to s[4]. But if we want to only increment i once, should it be

s[i++] = t[i];                      /* WRONG */

or

s[i] = t[i++];                      /* WRONG */

But the answer is that neither of these will work. In expressions like these, which have i in one place and i++ in the other place, there's no way to tell whether i gets the old value or the new value. (In particular, there's no left-to-right or right-to-left rule that would tell us.)

So although expressions like i++ and a[i++] are indeed super useful, you have to be careful when you use them, to make sure you don't go over the edge and have too much happening at once, such that the evaluation order becomes undefined. Sometimes this means you have to back off, and not use the "super useful" idiom, after all. For example, a safe way to print those values would be

printf("%d: %d\n", i, a[i]); i++;
printf("%d: %d\n", i, a[i]); i++;
printf("%d: %d\n", i, a[i]); i++;

and a safe way to copy from t[1] to s[i] would be

s[i] = t[i]; i++;

You can read more in this answer about how to recognize well-defined expressions involving ++ and --, and how to avoid the undefined ones.

score 1 · Answer 3 · answered Jul 27 '21 at 14:12

The evaluations of s[i++] and t[j++] are unsequenced relative to each other. Semantically, it's equivalent to:

t1 = i;
t2 = j;

s[t1] = t[t2]; 
i = i + 1;
j = j + 1;

with the caveat that the last three assignments can happen in any order, even simultaneously (either in parallel or interleaved)¹. The compiler doesn't have to create temporaries, either - the whole thing can be evaluated as

s[i] = t[j];
i = i + 1;
j = j + 1;

Alternately, the side effects of i++ and j++ can be applied before the update to s:

t1 = j;
j = j + 1;
t2 = i;
i = i + 1;
s[t2] = t[t1];

The current values of i and j must be known before you can index into the arrays, and the value of t[j] must be known before it can be assigned to s[i], but beyond that there's no fixed order of evaluation or of the application of side effects.

^{This is why expressions like x = x++ or a = b++ * b++ or a[i] = i++ all invoke undefined behavior - there's no fixed order for evaluating or applying side effects, so the results can vary by compiler, compiler settings, even by the surrounding code, and the results don't have to be consistent from run to run.}

How does the C expression s[i ++] = t[j ++] exactly work in order?

3 Answers3