1

For example,

char * integerToString(void);

int main() {
    char *myString;
    do {
        myString = integerToString();
    } while (myString == (char *)-1); // worked as intended
    free(myString);
    return 0;
}

char * integerToString(void) {

    int userInput;
    printf("Enter an integer: ");
    scanf("%d", &userInput);

    if (userInput < 0 || userInput > 99)
        return (char *)-1; // what happens here?

    char *myString = (char *)malloc(sizeof(char) * 2);
    myString[0] = (int)floor(userInput/10.0) + '0';
    myString[1] = userInput%10 + '0';
    return myString;
}

and the program worked as intended, but what exactly happens when you type cast an integer value (without assigning the integer to a variable) into a character pointer? Will this program always work? Thanks.

alk
  • 69,737
  • 10
  • 105
  • 255
Ignacio
  • 23
  • 2

3 Answers3

4

C99:

6.3.2.3 Pointers

  1. An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

[...]

  1. An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

So casting -1 to a pointer has implementation-defined results. Therefore the answer is no: This is not guaranteed to work in general.


In particular: If it does turn out to be a trap representation, your code runs afoul of:

6.2.6 Representation of types

6.2.6.1 General

[...]

  1. Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.

I.e. while (myString == (char *)-1); has undefined behavior if myString is a trap representation.

melpomene
  • 84,125
  • 8
  • 85
  • 148
  • 1
    I don't see why it's not guaranteed to work in general. Why would it matter if the result is not correctly aligned, doesn't point to an entity of the referenced type, or is a trap representation? There's just a comparison for equality in the code. The pointer is never dereferenced. There's no reason for this not to work. – Nikos C. Dec 03 '17 at 10:37
  • 3
    @Nikos C. On some weird implementations, you might have some variable `a` such as `(char*)&a == (char*)-1` – Basile Starynkevitch Dec 03 '17 at 10:39
  • 3
    @NikosC. If it is a trap representation, then `myString == ...` has undefined behavior. And as Basile says, "implementation-defined" can also mean it inadvertently produces a pointer to an existing object in your code. – melpomene Dec 03 '17 at 10:40
  • @BasileStarynkevitch That can't happen, because of `malloc(sizeof(char) * 2)`. The return value of `malloc()` cannot be `-1`. That would require one more `char` to allocated past `-1` which would overflow the maximum value of a pointer. – Nikos C. Dec 03 '17 at 10:41
  • 2
    @NikosC. The return value of `malloc()` can be `(char *)-1` because `(char *)-1` can turn into anything the implementation wants. It's an arbitrary value. – melpomene Dec 03 '17 at 10:42
  • 3
    What makes you think that `malloc` can *never* give `(char*)-1`. This is certainly true on my Linux desktop, but could be false on some weird microcontroller. – Basile Starynkevitch Dec 03 '17 at 10:43
  • @BasileStarynkevitch Because two `char` were allocated. If the address of the first one is `(char*)-1`, the next one would have an address that's too big to fit in the pointer and would overflow. For example, if `ptr = malloc(2)` is `(char*)-1`, then isn't `&ptr[1]` NULL? – Nikos C. Dec 03 '17 at 10:48
  • No, an address could be much wider than a `long` so you'll lose bits in the `(char*)-1` cast... Or might be non-unique (think of 1980 era x86 in 16 bits mode, segmented). – Basile Starynkevitch Dec 03 '17 at 10:49
  • 4
    @NikosC. `(char *)-1` could produce `0x6f0a12`. I don't think you've realized what "arbitrary value" really means. – melpomene Dec 03 '17 at 10:50
  • @NikosC. You're assuming the null pointer is represented as the integer 0 and that there's a straightforward mapping between integers and pointers. The C standard guarantees neither. – Petr Skocik Dec 03 '17 at 10:50
1

What happens when you type cast an integer value into a char pointer?

In general, that is undefined behavior (at least as soon as you dereference it). Be very scared. Read a lot more about UB (it is a tricky subject).

In some documented cases, you can case an uintptr_t or intptr_t integral value into a valid pointer.

In your case, your heap allocated string is too short (so you have a buffer overflow, which is one of the many examples of UB). You forgot the space for the terminating NUL byte, and you forgot to check against failure of malloc. BTW, sizeof(char) is always 1.

You could code:

if (userInput < 0 || userInput > 99)
    return NULL;

char *myString = (char *)malloc(3);
if (!myString) { perror("malloc myString"); exit(EXIT_FAILURE); };
myString[0] = (int)floor(userInput/10.0) + '0';
myString[1] = userInput%10 + '0';
myString[2] = (char)0;
return myString;

On most systems (but not all), (char*)-1 is never a valid address (always outside of the virtual address space) and can never be given by system (or standard) functions. On My Linux/x86-64 desktop, I know that (char*)-1 is not a valid address (e.g. because it is MAP_FAILED), and I could (sometimes) use that as a sentinel non-null pointer value (which should not be derefenced). But that makes my code less portable.

So you could decide and document that your integerToString gives (char*)-1 on non-integer input and NULL on heap allocation failure. That would work on my Linux/x86-64 desktop (so I sometimes do that). But that is not pure (portable) C11 code.

But if you stick to the C11 standard (read n1570) it is implementation defined what and if (char*)-1 is meaningful. It might be some trap representation that you are not even allowed to compare (even if I don't know any actual C implementation doing that).

Actually your example illustrates that people never code for pure standard C11; they always (and so do I) make additional assumptions on the C implementation; but you do need to be aware of them, and these assumptions could make the porting of your code to some hypothetical future machine a nightmare.

Will this program always work?

This is a too general question. Your original program didn't even handle failure of malloc and had a buffer overflow (because you forgot the space for the terminating zero byte). However, sadly for you, it would apparently often seems to work (and that is why UB is so scary). Consider however this (standard conforming, but non-realistic) malloc implementation as food for thought.

(explaining exactly why your program appears to behave like you want is really difficult, because you need to dive into several implementation details)

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 1
    Undefined behavior? If you *derefence* that pointer, it sure is undefined. But just abusing a `char *` as a `size_t` should be okay in itself. – Martin Ueding Dec 03 '17 at 10:29
1

This program is an example of improper error handling. The value of (char *)-1 appears to be implementation defined, see the other answers. Since this address is likely not a valid memory address that would be returned from malloc, this is used as a sentinel value in the program. The actual value is not of interest, it is compared to the same expression in the other function.

If you run this, malloc just might return whatever value that (char *)-1 evaluates to. Then it will be interpreted as an error, although it is a valid memory address.

A better way would be to have an argument to integerToString of type int * and use this as a boolean to indicate failure. Then one would not reserve one char * value for error handling.

Or use C++ and an exception.

Martin Ueding
  • 8,245
  • 6
  • 46
  • 92