5

In this thread the top rated answer received a lot of up votes and even a bounty. It proposes the following algorithm:

void RemoveSpaces(char* source)
{
  char* i = source;
  char* j = source;
  while(*j != 0)
  {
    *i = *j++;         // UB?
    if(*i != ' ')
      i++;
  }
  *i = 0;
}

My knee jerk reaction was that this code invokes undefined behavior, because i and j point at the same memory location, and an expression such as *i = *j++; would then access the same variable twice, for other purposes than to determine what to store, with no sequence point in between. Even though they are two different variables, they initially point at the same memory location.

However I am not certain, as I don't quite see how the two non-sequenced accesses of the same memory location could cause any harm in practice.

Am I correct in stating that this is undefined behavior? And if so, are there any examples of how relying on such UB could cause harmful behavior?


EDIT

Relevant part of the C standard which would label this as UB is:

C99 6.5

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

C11 6.5

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.

The actual meaning of the text should be the same in both versions of the standard, but I believe the C99 text is far easier to read and understand.

Community
  • 1
  • 1
Lundin
  • 195,001
  • 40
  • 254
  • 396

4 Answers4

6

There are two situations where accessing the same object twice without an intervening sequence point is undefined behaviour:

  1. If the modify the same object twice. For example

    int x = (*p = 1, 1) + (*p = 2, 100);
    

    Obviously you wouldn't know whether *p is 1 or 2 after this, but the wording in the C standard says that it is undefined behaviour, even if you write

    int x = (*p = 1, 1) + (*p = 1, 100);
    

    so storing the same value twice doesn't save you.

  2. If you modify the object, but also read it without using the value read to determine the new value of the object. That means

    *p = *p + 1; 
    

is fine, because you read *p, you modify *p, but you read *p in order to determine the value stored into *.

Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • I like this answer a lot more than my own, though my answer is not wrong, this answer explains it a lot better, I guess my understanding of the problem is not 100% clear, otherwise and regardless of the fact that english is not my native language, I would have expressed it better. – Iharob Al Asimi May 21 '15 at 14:10
  • So what you are saying is that the code _is_ UB, because of 2)? It reads `j` and in the same expression it also increases `j` by 1, which is not done for the purpose of determining what value to store in `i`. – Lundin May 21 '15 at 14:29
3

There is no UB here (it is even idiomatic C), because :

  • *i is only modified once (in *i =)
  • j is only modified once (in *j++)

Of course in posted code i and j can point at same location (and do at first pass) but ... they are still different variables. So in the line *i = *j++; :

  • addresses are read into both pointers (i and j)
  • prior value is read (*j++) and is used in determining the value to be stored
  • only j pointer is modified
  • source is modified through an unmodified pointer

It definitively is not UB.


The followings invoke UB :

*i = *j++ + *j++;  // UB j modified twice
i = i++ + j;       // UB i modified twice
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • But for the code posted, `source` is modified twice... Are you saying that variable names are what determine UB, not memory accesses? – Lundin May 22 '15 at 06:14
  • 1
    @Lundin : no, source is modified only once. The other change in on j pointer. But see my edit. – – Serge Ballesta May 22 '15 at 07:43
  • Sorry, not changed, _accessed_, For other purposes than to determine which value to store. – Lundin May 22 '15 at 07:46
  • @Lundin : No. The value is stored in `*i` (say it is source). And prior value of `*i` is only used to determine the value to store. The **address** is used for second access. `*i = *(j++);` would cause that UB, because here source *value* would be modified twice. – Serge Ballesta May 22 '15 at 07:58
  • That would still leave us with the pointer variable (address of source) getting accessed three times. First to determine what value to store, then accessed to determine where to store it, then increased. So the value of the _pointer_ is modified once, but it is also accessed for two other purposes unrelated to calculating the value to store (j+1). – Lundin May 22 '15 at 08:08
  • 1
    @Lundin : No, you must think about *variables*. Variable `j` is accessed only once. `*i = *j + *j++;` is UB, because j is accessed twice, and one is not use to determine the value to store. – Serge Ballesta May 22 '15 at 08:26
0

I don't think it would cause UB. To my mind, that's as OK as saying

int k=0;
k=k; //useless but does no harm

It wouldn't do any harm to read data from memory and then write it into the same position.

ForceBru
  • 43,482
  • 10
  • 63
  • 98
0

Break down the expression *i = *j++. The order of precedence of the three operators are: ++ (post increment) is highest, then operator * (pointer dereference), and = is lowest.

So, j++ will be evaluated first (with a result equal to j and an effect of incrementing j). So the expression is equivalent to

 temp = j++;
 *i = *temp;

where temp is a compiler generated temporary that is a pointer. Neither of the two expressions here have undefined behaviour. Which means the original expression does not have undefined behaviour either.

Peter
  • 35,646
  • 4
  • 32
  • 74
  • Evaluated first doesn't mean executed first. I think more likely the compiler will translate this into machine code as `*i = *j; j++;` There should be no need for a temporary object. Unless perhaps the variables involved are volatile, but that is not the case here. – Lundin May 21 '15 at 14:25
  • 1
    I included the temporary because that is the semantics of post increment. As long it produces the same net effect, a compiler can reorder and eliminate the temporary, as you describe. But it is not required to. – Peter May 21 '15 at 14:29
  • Yes and for that reason there are no guarantees that a temporary variable will be created. If there was such a guarantee, then in that case the code would definitely not have harmful behavior. – Lundin May 21 '15 at 14:33
  • 1
    You're missing the point. I'm not saying a temporary will be created. I'm saying the compiler is required to produce the same net effect (the changes of `*i`, `*j`, and `j`) as if it had generated a temporary. – Peter May 21 '15 at 14:42
  • I don't believe so, the compiler is only required to not optimize out the consequences any _side effects_, as in the formal definition of a side effect in the C standard. Reads of (non-volatile) variables are not side effects, only writes. – Lundin May 21 '15 at 14:49
  • Beware : you argument would conclude that `*i = *i++ + *j++` is not UB ... while it is indeed ! `i + j` is evaluated first, `j` is then post-incremented, but nothing says what comes first of `i` incrementation and assignement to `*i`. The 2 operations are not *sequenced*. – Serge Ballesta May 22 '15 at 07:42
  • Not true. I did a specific analysis of the expression asked about. I did not make general sweeping statements suggesting applicability to any other expression. – Peter May 22 '15 at 10:41