5

Let's say I have this code that copies one block of memory to another in a certain order based on their location:

void *my_memmove(void *dest, const void *src, size_t len)
{
    const unsigned char *s = (const unsigned char *)src;
    unsigned char *d = (unsigned char *)dest;

    if(dest < src)
    {
        /* copy s to d forwards */
    }
    else
    {
        /* copy s to d backwards */
    }

    return dest;
}

This is undefined behavior if src and dest do not point to members of the same array(6.8.5p5).

However, let's say I cast these two pointers to uintptr_t types:

#include <stdint.h>

void *my_memmove(void *dest, const void *src, size_t len)
{
    const unsigned char *s = (const unsigned char *)src;
    unsigned char *d = (unsigned char *)dest;

    if((uintptr_t)dest < (uintptr_t)src)
    {
        /* copy s to d forwards */
    }
    else
    {
        /* copy s to d backwards */
    }

    return dest;
}

Is this still undefined behavior if they're not members of the same array? If it is, what are some ways that I could compare these two locations in memory legally?

I've seen this question, but it only deals with equality, not the other comparison operators (<, >, etc).

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76

5 Answers5

11

The conversion is legal but there is, technically, no meaning defined for the result. If instead you convert the pointer to void * and then convert to uintptr_t, there is slight meaning defined: Performing the reverse operations will reproduce the original pointer (or something equivalent).

It particular, you cannot rely on the fact that one integer is less than another to mean it is earlier in memory or has a lower address.

The specification for uintptr_t (C 2018 7.20.1.4 1) says it has the property that any valid void * can be converted to uintptr_t, then converted back to void *, and the result will compare equal to the original pointer.

However, when you convert an unsigned char * to uintptr_t, you are not converting a void * to uintptr_t. So 7.20.1.4 does not apply. All we have is the general definition of pointer conversions in 6.3.2.3, in which paragraphs 5 and 6 say:

An integer may be converted to any pointer type. Except as previously specified [involving zero for null pointers], the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

Any pointer type may be converted to an integer type. Except as previously specified [null pointers again], the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

So these paragraphs are no help except they tell you that the implementation documentation should tell you whether the conversions are useful. Undoubtedly they are in most C implementations.

In your example, you actually start with a void * from a parameter and convert it to unsigned char * and then to uintptr_t. So the remedy there is simple: Convert to uintptr_t directly from the void *.

For situations where we have some other pointer type, not void *, then 6.3.2.3 1 is useful:

A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.

So, converting to and from void * is defined to preserve the original pointer, so we can combine it with a conversion from void * to uintptr_t:

(uintptr_t) (void *) A < (uintptr_t) (void *) B

Since (void *) A must be able to produce the original A upon conversion back, and (uintptr_t) (void *) A must be able to produce its (void *) A, then (uintptr_t) (void *) A and (uintptr_t) (void *) B must be different if A and B are different.

And that is all we can say from the C standard about the comparison. Converting from pointers to integers might produce the address bits out of order or some other oddities. For example, they might produce a 32-bit integer contain a 16-bit segment address and a 16-bit offset. Some of those integers might have higher values for lower addresses while others have lower values for lower addresses. Worse, the same address might have two representations, so the comparison might indicate “less than” even though A and B refer to the same object.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Even if the result has no meaning if they do not come from same array, the behavior would be correct in this case because either is fine, right? – user202729 Aug 24 '19 at 17:16
  • @user202729: It is not correct the result has no meaning if they do not come from the same array. The rule for the `<` operand about pointers being from the same object (or one past), including aggregates (you can compare addresses of structure members) applies **only** to using `<` with pointers. It has no relevance to using `<` with integers derived from pointers. For integers derived from pointers, we must look to the rules about converting from pointers to integers. As my answer notes, two conclusions are possible. One is that if the pointers are different, the resulting `uintptr_t` must… – Eric Postpischil Aug 24 '19 at 17:21
  • … be different. The other is that the C implementation is required to define the pointer-to-integer conversion, so answers can be found in the documentation for the implementation. – Eric Postpischil Aug 24 '19 at 17:21
  • Hmm... You might want to take a look at my question again (I'm comparing `dest` and `src`, of which one is `void *` and one is `const void *`). – S.S. Anne Aug 24 '19 at 17:27
  • 1
    @JL2210: Yes, I sidestepped that, but my answer is correct. If you want to followed the standard completely, convert to `void *`, then convert to `uintptr_t`. There is no reason for an implementation to treat `const void *` differently in this regard (unlike for other pointer types—segment-offset architectures and others might have reasons for treating `int *` differently from `void *`, for example, but I see no reason to treat `const void *` and `void *` differently), so it is fairly safe, but explicitly convert if you wish to be guaranteed by the standard. – Eric Postpischil Aug 24 '19 at 17:53
4

No. Each results in an implementation-defined value, and comparison of integers is always well-defined (as long as their values are not indeterminate). Since the values are implementation-defined, the result of the comparison need not be particularly meaningful in regard to the pointers; however, it must be consistent with the properties of integers and the values that the implementation-defined conversions produced. Moreover, the C standard expresses an intent that conversions of pointers to integers should respect the address model of the implementation, making them somewhat meaningful if this is followed. See footnote 67 under 6.3.2.3 Pointers:

The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.

However, some current compilers wrongly treat this as undefined behavior, at least under certain conditions, and there is a movement from compiler folks to sloppily formalize that choice via a notion of "provenance", which is gratuitously internally inconsistent and a disaster in the making (it could be made internally consistent and mostly non-problematic with trivial changes that are cost-free to code where it matters, but the people who believe in this stuff are fighting that for Reasons(TM)).

I'm not up-to-date on the latest developments in the matter, but you can search for "pointer provenance" and find the draft documents.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 2
    Argh, that "intended to be" is making my life so much harder right now. – S.S. Anne Aug 24 '19 at 17:34
  • *"However, some current compilers wrongly treat this as undefined behavior, at least under certain conditions, and there is a movement from compiler folks to sloppily formalize that choice via a notion of "provenance"*. Now where's *that* source? As far as I know is just that you cannot use integer *math* to *escape* the provenance. – Antti Haapala -- Слава Україні Aug 24 '19 at 18:04
  • @AnttiHaapala: Definedness of conversions: 6.3.2.3 Pointers, ¶6, and 7.20.1.4 Integer types capable of holding object pointers, ¶1. Definedness of comparison of integers: 6.5.8 Relational operators, ¶6. – R.. GitHub STOP HELPING ICE Aug 24 '19 at 18:18
  • 1
    Regarding "escape the provenance", there is no such concept in the C language. It's a proposal invented by compiler folks to make their existing invalid transformations valid (because their IRs fail to distinguish types correctly) that has all sorts of internal consistencies where the definedness of expressions can't be defined abstractly and can only be defined in terms of implementation details of the compiler. – R.. GitHub STOP HELPING ICE Aug 24 '19 at 18:22
  • 1
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2090.htm#q7-can-equality-testing-on-integers-that-are-derived-from-pointer-values-be-affected-by-their-provenance - the behaviour did exist on some GCC and Clang and was considered a compiler bug – Antti Haapala -- Слава Україні Aug 24 '19 at 18:41
  • @R..: The problem is that the authors of C assumed it would be run on platforms whose execution model mirrored real machines, and didn't think it necessary to mandate all the corner cases that real machines would handle anyway. The execution model used by clang and gcc may be useful for some purposes, but it's not the one for which C was designed, nor is it suitable for all purposes for which the intended execution model would be suitable. – supercat Sep 01 '19 at 03:24
  • @AnttiHaapala: What useful purpose is served by having compilers try to track provenance through integers? Are there any non-contrived situations in which the performance of high-performance code would be meaningfully impaired by treating pointer-to-integer conversions as a "you don't know how this pointer will be used", and integer-to-pointer conversions as "you don't know where this pointer came from" indications? – supercat Sep 01 '19 at 03:33
  • @R..: BTW, clang and gcc sometimes treat *equality* comparisons as UB in cases where one pointer directly identifies an object, and the other is a "past-one" pointer for an unrelated object. IMHO, the Standard should allow implementations to treat the results of such comparisons as Unspecified if they define a "warning" macro, but clang and gcc go well beyond that. They assume that if `x` is coincidentally equal to `y`, and `y` can't alias `z`, that would imply that `x` can't access `z` either, even if `x` is in fact visibly derived from `z`. – supercat Sep 01 '19 at 03:49
4

Comparing two pointers converted to uintptr_t should not have undefined behaviour at all. It does not even should have unspecified behaviour. Note that you should first cast the values to void * to ensure the same presentation, before casting to uintptr_t. However, compilers have had behaviour where two pointers were deemed to be unequal even though they pointed to the same address, and likewise, these pointers cast to uintptr_t compared unequal to each other (GCC 4.7.1 - 4.8.0). The latter is however not allowed by the standard. However there is *ongoing debate on the extent of pointer provenance tracking and this is part of it.

The intent of the standard according to C11 footnote 67 is that this is "to be consistent with the addressing structure of the execution environment". The conversion from pointer to integer is implementation-defined and you must check the implementation for the meaning of the cast. For example for GCC, it is defined as follows:

The result of converting a pointer to an integer or vice versa (C90 6.3.4, C99 and C11 6.3.2.3).

  • A cast from pointer to integer discards most-significant bits if the pointer representation is larger than the integer type, sign-extends 2) if the pointer representation is smaller than the integer type, otherwise the bits are unchanged.

  • A cast from integer to pointer discards most-significant bits if the pointer representation is smaller than the integer type, extends according to the signedness of the integer type if the pointer representation is larger than the integer type, otherwise the bits are unchanged.

  • When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.

For example on x86-32, x86-64 and GCC we can be assured that the behaviour of a pointer converted to uintptr_t is that the linear offset is converted as-is.


The last clause refers to pointer provenance, i.e. the compiler can track the identity of pointer stored in an (u)intptr_t, just like it can track the identity of a pointer in any other variable. This is totally allowed by C standard as it states just that you are ever guaranteed to be able to cast a pointer to void to (u)intptr_t and back again.

I.e.

 char foo[4] = "abc";
 char bar[4] = "def";

 if (foo + 4 == bar) {
     printf("%c\n", foo[4]); // undefined behaviour
 }

and given that foo + 4 compares equal to bar (allowed by the C standard), you cannot dereference foo[4] because it does not alias bar[0]. Likewise even if foo + 4 == bar you cannot do

 uintptr_t foo_as_int = (uintptr_t)(void *)foo;
 if (foo_as_int + 4 == (uintptrt_t)(void *)bar) {
     char *bar_alias = (void *)(foo_as_int + 4);

     printf("%c\n", bar_alias[0]); // undefined behaviour
 }
  • The Standard relies upon quality implementations intended for various tasks to behave sensibly in cases beyond those mandated by the Standard, whenever necessary to accomplish those tasks. From the point of view of the Standard, even evaluating `((char*)(uintptr_t)foo)[0]` would invoke Undefined Behavior; if, e.g. `foo` happened to equal `bar+4`, `((char*)(uintptr_t)foo)` might yield `bar+4`, and attempting to dereference that would yield UB even though it happens to equal `foo`. To be sure, support for integer-to-pointer conversions would be pretty useless if they weren't defined... – supercat Sep 01 '19 at 03:40
  • ...in cases beyond what the Standard mandates, and there's no reason why anyone designing a quality compiler should treat the Standard as a full specification of the cases it should handle reliably. Unfortunately, the authors of clang and gcc failed to realize this when they designed their optimizer around an execution model that is grossly unsuitable for the purposes where C should be most useful, and doesn't quite fit all the corner cases of the Standard either. – supercat Sep 01 '19 at 03:43
1

There is no guarantee that the numeric value produced by converting a pointer to uintptr_t have any meaningful relationship to the pointer in question. A conforming implementation with enough storage could make the first pointer-to-integer conversion yield 1, the second one 2, etc. if it kept a list of all the pointers that were converted.

Practical implementations, of course, almost always perform pointer-to-uintptr_t conversions in representation-preserving fashion, but because the authors of the Standard didn't think it necessary to officially recognize a category of programs that would be portable among commonplace implementations for commonplace platforms, some people regard any such code as "non-portable" and "broken". This completely contradicts the intention of the Standard's authors, who made it clear that they did not wish to demean programs that were merely conforming but not strictly conforming, but it is unfortunately the prevailing attitude among some compiler maintainers who don't need to satisfy customers in order to get paid.

supercat
  • 77,689
  • 9
  • 166
  • 211
1

No, it's only implementation-defined behavior. However, if you use == to make sure the objects overlap before comparing them with < or >, then it is neither implementation-defined behavior or undefined behavior. This is how you would implement such a solution:

#include <string.h>

void *my_memmove(void *dest, const void *src, size_t len)
{
    const unsigned char *s = src;
    unsigned char *d = dest;
    size_t l;

    if(dest == src)
        goto end;

    /* Check for overlap */
    for( l = 0; l < len; l++ )
    {
        if( s + l == d || s + l == d + len - 1 )
        {
            /* The two objects overlap, so we're allowed to
               use comparison operators. */
            if(s > d)
            {
                /* copy forwards */
                break;
            }
            else /* (s < d) */
            {
                /* copy backwards */
                s += len;
                d += len;
                while(len--)
                {
                    *--d = *--s;
                }
                goto end;
            }
        }
    }

    /* They don't overlap or the source is after
       the destination, so copy forwards */
    while(len--)
    {
        *s++ = *d++;
    }

end:
    return dest;
}
S.S. Anne
  • 15,171
  • 8
  • 38
  • 76