1

*char_ptr++ -= 0x20

Why that expression increments pointer just once? This expression can be decomposed like

*char_ptr++ = *char_ptr++ - 0x20

That means pointer must be incremented twice. First time on the right side, and second on the left. But this incremented just by once.

This expression is the part of the function below

void to_lower_case(char *char_ptr)
{
    while(*char_ptr)
    {
        if (*char_ptr < 0x41)
        {
            printf("\nInvalid login\n");
            exit(0);
        }
        if (*char_ptr > 0x5A)
            *char_ptr++ -= 0x20;// *char_ptr++ = *char_ptr++ - 0x20
    }
}
helsereet
  • 111
  • 1
  • 5
  • 3
    This is not actually undefined. Subexpressions are guaranteed to be only evaluated once. – zch Jul 27 '19 at 20:37
  • nope, it can be decomposed as: `*char_ptr = 0x20; char_ptr++;` You use the `++` operator only once, so the pointer is incremented only once, most probably at the end of the statement. – Luis Colorado Jul 29 '19 at 06:27

3 Answers3

4

From one version of the C Language Standard (N1570, section 6.5.16.2 Compound assignment) it says:

A compound assignment of the form E1 op= E2 is equivalent to the simple assignment expression E1 = E1 op (E2), except that the lvalue E1 is evaluated only once, ...

Using the -= expression, this says that

*char_ptr++ -= 0x20;

is the equivalent of

*char_ptr++ = *char_ptr++ - 0x20;

except that the *char_ptr++ part is evaluated only once (so the pointer increment will only happen once).

Equivalently, it is the same as

*char_ptr = *char_ptr - 0x20;
char_ptr++;
1201ProgramAlarm
  • 32,384
  • 7
  • 42
  • 56
  • I think that this is actually not the relevant rule, because this applies only to the value computation of `*char_ptr`, but not to the side effect (for which a separate rule under 6.5.16 (3) exists. – Stephan Lechner Jul 27 '19 at 21:01
  • link to [C11 6.5.16.2](http://port70.net/~nsz/c/c11/n1570.html#6.5.16.2) – pmg Jul 27 '19 at 21:01
  • 1
    @StephanLechner: This is the relevant rule. C 2018 5.1.2.3 2 says “*Evaluation* of an expression in general includes both value computations and initiation of side effects.” (Italics in the original.) – Eric Postpischil Jul 27 '19 at 21:06
  • @Eric Postpischil: clear, but without 6.5.16 (3) it would be UB, wouldn't it? The side effect needs to be applied after all value computations had occurred, and the "evaluated only once"-statement of 6.5.16.2 does not say something about that. – Stephan Lechner Jul 27 '19 at 21:12
  • 1
    @StephanLechner: Regarding 6.5.16 3, that tells us about the effect `-=` has on the object being assigned. It is not relevant to the behavior of `++`. `++` has the side effect of updating `char_ptr`. The `-=` has the side effect of updating the object resulting from `*char_ptr`. Unless `char_ptr` is pointing into itself, these are separate objects, and there is no conflict between the side effects. – Eric Postpischil Jul 27 '19 at 21:32
  • @StephanLechner: The side effect of the `-=` needs to be applied after the value computations occur. The side effect of the `++` can occur anytime with respect to the various things in the assignment. I see no risk of undefined behavior unless `char_ptr` points into itself. – Eric Postpischil Jul 27 '19 at 21:34
3

When "we" say a += b; is the same as a = a + b; "we" mean that as a 'simplification', not text substitution as in a macro.

What happens with *chr_ptr++ -= 0x20; is the equivalent of

// *char_ptr++ -= 0x20;
/*1*/ char *tmp = char_ptr;
/*2*/ chr_ptr++; // chr_ptr += 1; // chr_ptr = chr_ptr + 1;
/*3*/ *tmp -= 0x20; // *tmp = *tmp - 0x20;

Note that /*2*/ can happen where I put it or after /*3*/

pmg
  • 106,608
  • 13
  • 126
  • 198
  • 1
    `tmp` would be `char_ptr`, not `+ 1`. It is post-increment. – Acorn Jul 27 '19 at 20:48
  • Welcome! :-) Note that in that case you can drop `tmp` and simply use `char_ptr`, too. – Acorn Jul 27 '19 at 20:49
  • Better with the statement order changes, @Acorn? Thanks again :) – pmg Jul 27 '19 at 20:51
  • @ikegami Not sure what you mean. `tmp` is exactly equal to `char_ptr` now, so there is no need to use it. i.e. `*char_ptr -= 0x20; char_ptr++;` is simpler to understand. – Acorn Jul 27 '19 at 20:54
  • @Acorn: the thing is the side-effect of changing `chr_ptr` can happen before the assignment as in the ordering above. – pmg Jul 27 '19 at 20:56
  • If you reorder them, of course; but not when you had them reversed. Anyway, we should not reorder them. While it usually does not matter, it is not always the case. – Acorn Jul 27 '19 at 21:10
  • 2
    @Acorn, I became confused, which is why I deleted the comment before you even replied :) /// Re "*but not when you had them reversed.*", pmg always had a statement saying the order could be reversed. – ikegami Jul 28 '19 at 01:45
0

You've gotten the answer from the C Standard, but let's also think about why the definition is the way it is, why the expression *char_ptr++ -= 0x20 does not increment char_ptr twice.

Suppose I had a null-terminated array of characters (as in fact the original code does), and suppose I wanted to subtract 0x20 from each character (which is similar to what the original code is doing). I might write

while(*char_ptr)
    *char_ptr++ -= 0x20;

Now, even before we figure out exactly what this code does, certain things jump out at us. In particular, the while(*char_ptr) part and the char_ptr++ part immediately tell us that this code is looping over the characters until it hits a null (or zero) character -- in other words, it is looping over the characters of a string. This is an extremely common idiom in C code.

And in this case, what it's doing with each character of the string is of course subtracting the value 0x20 from it.

So if the expression *char_ptr++ -= 0x20 did end up incrementing char_ptr twice, this code wouldn't work! And that would be sad. So it's good that the definition of the -= operator (and indeed the definition of all the "op=" operators) is that the left-hand side is evaluated only once. And it's no coincidence that they were defined this way, either -- they're defined this way precisely so that code like this works as expected.

While we're at it, let's look at a couple of other aspects of the original code.

What are those magic numbers 0x41 and 0x5A? Well, in ASCII, 0x41 is capital A, and 0x5A is capital Z. There's a school of thought that says you can't do C programming without an ASCII table handy, but in fact, laboriously looking up such codes us unnecessary extra work, because the compiler is perfectly willing to do it for us. We can write

if (*char_ptr < 'A')

and

if (*char_ptr > 'Z')

and we'll get the same result, with the added benefits that (a) the code is clearer and easier to read and (b) it's that much more portable to some hypothetical machine that doesn't use ASCII.

The magic number 0x20 that's being subtracted from the lower-case letters is the difference between a lower-case A and a capital A. (In ASCII, lower-case A is 0x61.) So if you've got a lower-case letter, subtracting 0x20 turns it into the corresponding upper-case letter. So it looks like the code in the question is misnamed: It's actually converting to upper case, not lower case. (It's also going to mistakenly convert certain other characters, since anything greater than Z is converted, which will include punctuation characters like '[' and '|'.)

Finally, since the expression we've been talking about is the only part of the code that increments char_ptr, and since it acts only for non-upper-case characters, if the input contains any upper-case letters, char_ptr won't get incremented even once, and this code will get stuck in an infinite loop.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • Good point of the not incrementing char_ptr in the “else” condition. In first time function was written properly. But then I tried to reduce code. Conclusion that you shouldn’t pursue to “less lines of codes”. But even if you try you must test that on all various data – helsereet Jul 28 '19 at 06:48