21

While I was putting together a to-uppercase function in C++ I noticed that I did not receive the expected output in C.

C++ function

#include <iostream>
#include <cctype>
#include <cstdio>

void strupp(char* beg)
{
    while (*beg++ = std::toupper(*beg));
}

int main(int charc, char* argv[])
{
    char a[] = "foobar";
    strupp(a);
    printf("%s\n", a);
    return 0;
}

Output as expected:

FOOBAR


C function
#include <ctype.h>
#include <stdio.h>
#include <string.h>

void strupp(char* beg)
{
    while (*beg++ = toupper(*beg));
}

int main(int charc, char* argv[])
{
    char a[] = "foobar";
    strupp(a);
    printf("%s\n", a);
    return 0;
}

The output is the expected result with the first character missing

OOBAR

Does anyone know why the result gets truncated while compiling in C?

Alex Koukoulas
  • 998
  • 1
  • 7
  • 21
  • 5
    And if you really wanted to do this in `C++`: `std::transform(a, a + strlen(a), a, std::toupper);` – PaulMcKenzie Oct 12 '15 at 16:40
  • Can you explain why you expected this to convert a string to uppercase? Specifically, why did you expect the right side of the `=` to be evaluated before the left? – David Schwartz Oct 12 '15 at 16:41
  • I'm thankful to all people who gave feedback and for the valuable info provided – Alex Koukoulas Oct 12 '15 at 16:46
  • Also add cast: `while (*beg = toupper((unsigned char) *beg)) beg++;`. `toupper()` is not well defined for negative `char` values. – chux - Reinstate Monica Oct 12 '15 at 16:48
  • Don't write so complex code if you don't understand sequence points and operation priorities. And even if you understand them - care about your colleagues, write each statement on a new line. – Schullz Oct 12 '15 at 22:13
  • 1
    @Schullz I strongly believe that learning from your mistakes is the most efficient way not to make the same mistake again. Hence as this problem stemmed solely from experimentation and not in a team project environment I don't think it was a mistake to write this code snippet to begin with – Alex Koukoulas Oct 12 '15 at 23:04
  • @PaulMcKenzie, The docs for `std::transform` state that "std::transform does not guarantee in-order application of unary_op or binary_op. To apply a function to a sequence in-order or to apply a function that modifies the elements of a sequence, use std::for_each". – Millie Smith Oct 13 '15 at 06:52
  • So then something like `std::for_each(a, a + strlen(a), [](char &c){ c = std::toupper(c); });`? – Millie Smith Oct 13 '15 at 07:00
  • 1
    @MillieSmith In this case, does it matter the order of which item will be made upper case? Anyway, most, if not all of the classical C++ approaches of mutating a string use `std::transform`. – PaulMcKenzie Oct 13 '15 at 10:09
  • @AlexKoukoulas i think so about learning from mistakes. What about trying doing experiments with different compilers? It could be interesting :) – Schullz Oct 13 '15 at 14:18

4 Answers4

30

The problem is that there is no sequence point in

while (*beg++ = toupper(*beg));

So we have undefined behavior. What the compiler is doing in this case is evaluating beg++ before toupper(*beg) In C where in C++ it is doing it the other way.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • Does this mean that the classic c-string copying one-liner `while(*s++ = *t++) ;` has undefined behavior? – Mark H Oct 13 '15 at 01:20
  • 3
    @markh No as that is two different variables. This is synonymous with `while (*s++ = *s++)` – NathanOliver Oct 13 '15 at 01:40
  • @SteveJessop Actually, `while(*s++ = *t++);` has defined behaviour even if `s == t`. The reason `strcpy` is undefined with overlapping regions is that `strcpy` isn't necessarily implemented with `while(*s++ = *t++);`. – user253751 Oct 13 '15 at 03:11
  • @immibis: fair point, agreed it's defined. However, even if `strcpy` is implemented with that loop it still has undefined behavior for one "direction" of overlap. If the destination is greater than the source and they overlap, then the terminating nul gets overwritten before it's read, and eventually the buffer is overrun! The reason overlap is forbidden in both "directions" is for possible other implementations. – Steve Jessop Oct 13 '15 at 03:16
15
while (*beg++ = std::toupper(*beg));

leads to undefined behavior.

Whether *beg++ is sequenced before or after std::toupper(*beg) is unspecified.

The simple fix is to use:

while (*beg = std::toupper(*beg))
   ++beg;
R Sahu
  • 204,454
  • 14
  • 159
  • 270
10

The line

while (*beg++ = toupper(*beg));

contains a side effect on an entity that's being used twice. You can not know, whether or not beg++ is executed before or after the *beg (inside the toupper). You're just lucky that both implementations show both behaviors, as I'm pretty sure it's the same for C++. (However, there were some rule changes for c++11, which I'm not certain of - still, it's bad style.)

Just move the beg++ out of the condition:

while (*beg = toupper(*beg)) beg++;
J A
  • 335
  • 1
  • 8
3

with respect to above answer, the 'f' is never passed inside function, you should try using this:

     while ((*beg = (char) toupper(*beg))) beg++;
vishal
  • 2,258
  • 1
  • 18
  • 27