Is it possible in C (not invoking UB) to check if two objects overlap?

Question

When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.

If we have two pointers referencing the same type arrays and we have lengths of those arrays can we find if those arrays do not overlap without invoking a UB?

Remark: I am not interested in examples showing me that in the real life (implementation etc) it can be done. So please do not show the code (unless you can prove [standardwise] that is UB free).

LOL, I think the answer is (_possibly_ UB if the objects do NOT overlap) and (not UB if they DO overlap). — Dúthomhas, Dec 29 '22 at 00:24
I presume you are referring to [this question](https://stackoverflow.com/questions/4023320/how-to-implement-memmove-in-standard-c-without-an-intermediate-copy)? — Dúthomhas, Dec 29 '22 at 00:27
Is it actually allowed, with defined behaviour, for the objects to overlap *without* either one being a member of the other, or both being in the same explicit `union`? — Karl Knechtel, Dec 29 '22 at 00:51
I would cast to `uintptr_t` and then compare. I think it's implementation-defined then? Which is as good as it gets. — HolyBlackCat, Dec 29 '22 at 08:58
Come to the dark side (of C++) and use [std::less](https://en.cppreference.com/w/cpp/utility/functional/less) — Aykhan Hagverdili, Dec 29 '22 at 12:43
@AyxanHaqverdili note that std::less is allowed to interleave elements of unrelated arrays, so it could generate false positives. — Raymond Chen, Dec 29 '22 at 13:45
@RaymondChen can you elaborate on that? [Here](https://godbolt.org/z/zYnW16bbj)'s my example of a well defined `overlaps` function. Can you tell me how that can fail while satisfying the requirement of a 'strict total order'? — Aykhan Hagverdili, Dec 29 '22 at 14:15
@0___________ By "UB" do you mean, undefined behavior? In that case, perhaps you should indicate which standard you want to comply with. — jpaugh, Dec 29 '22 at 16:26
@jpaugh all known to me say the same. And indeed, UB means Undefined Behaviour — 0___________, Dec 29 '22 at 21:43
@AyxanHaqverdili One example where overlapping of unrelated data is possible (while still having a strict total order) is the [X86 segmented memory model](https://en.wikipedia.org/wiki/Flat_memory_model#X86_segmented_memory_model). Here are a few good reads about this: [How to check if a pointer is in a range of memory](https://devblogs.microsoft.com/oldnewthing/20170927-00/?p=97095) (blogpost from Raymond Chen) - [similar stackoverflow question](https://stackoverflow.com/a/39161283/8411406) - [strict total order of `std::less`](https://stackoverflow.com/a/69312063/8411406) — Turtlefight, Dec 29 '22 at 23:25
@AyxanHaqverdili so your [example](https://godbolt.org/z/zYnW16bbj) contains 2 assumptions: (1) running on a flat architecture and (2) the [**implementation-defined** strict total ordering](https://eel.is/c++draft/comparisons.general#2) provided by `std::less` is based solely on the linear address of the pointers. (this is implementation-defined so it isn't strictly required to be the case) – As long as those 2 assumptions hold for your given architecture + implementation it'll probably work fine - but the standard by itself does not guarantee that - it's *implementation-defined*. — Turtlefight, Dec 29 '22 at 23:52
If `uintptr_t` is available, you cast to that, and do the appropriate arithmetic there, you've moved from code with possibly undefined behavior to code with just implementation defined behavior. — Petr Skocik, Dec 30 '22 at 20:12

dbush · Accepted Answer · 2022-12-29T14:20:50.417

It is possible in standard C, though not as efficient as a non-standard approach.

The above quoted passage from section 6.5.8p5 of the C11 standard applies to relational operators, i.e. <, >, <=, and >=. The equality operators == and != do not have this restriction. They can be used to compare any two object pointers for equality.

Specifically, section 6.5.9p6 regarding the equality operators state:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

So you can check for overlap in a standard-compliant way by using == along with a pair of unsigned char * to iterate through the bytes of each object and compare their addresses for equality.

For example:

int overlap = 0;
unsigned char *o1 = (unsigned char *)&obj1;
unsigned char *o2 = (unsigned char *)&obj2;
for (int i=0; !overlap && i < sizeof obj1; i++) {
    for (int j=0; !overlap && j < sizeof obj2; j++) {
        if (o1 + i == o2 + j) {
            overlap = 1;
        }
    }
}

A more efficient approach would be to check the addresses of only the first byte of one object against the addresses of each byte in the other object, since if there is an overlap then the start of one object must be within the other:

int overlap(const void *p1, size_t size1, const void *p2, size_t size2)
{
    const unsigned char *o1 = p1;
    const unsigned char *o2 = p2;
    for (int i=0; i < size1; i++) {
        if (o1 + i == o2) {
            return 1;
        }
    }
    for (int i=0; i < size2; i++) {
        if (o2 + i == o1) {
            return 1;
        }
    }
    return 0;
}

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/250741/discussion-on-answer-by-dbush-is-it-possible-in-c-not-invoking-ub-to-check-if). — Henry Ecker, Dec 29 '22 at 22:18

H.S. · Answer 2 · 2022-12-29T16:01:40.680

The accepted answer is addressing OP's question by referring the appropriate section of language standard. But the second snippet of code posted in accepted answer will fail, in case, when the first object (array) is subset of second object (array) in such a way that first object is completely overlapped by second object but excluding the start and end element of second object i.e. overlapping like this -

                             object 2
                                |
    +-----------------------------------------------------------+
    |                                                           |
    |                                                           |

    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

        |                                                   |
        |                                                   |
        +---------------------------------------------------+
                                |
                             object 1 (any subset of this region)

This post is just a couple of modifications to address the problem in @dbush post second code snippet as well as making it a little more efficient by considering the size of element type of array in question.

/*
 * Parameters:
 * obj1    : Pointer to array1
 * obj1_sz : Size of array1
 * obj2    : Pointer to array2
 * obj2_sz : Size of array2
 * type_sz : Size of type of elements of array
 *
 * Return:
 * 0 - No overlap
 * 1 - Overlap
 *
 * [Assumption: Both array1 and array2 are of same type]
 */

int check_overlap (const void *obj1, size_t obj1_sz, const void *obj2, size_t obj2_sz, size_t type_sz) {
    const unsigned char *pobj1 = obj1;
    const unsigned char *pobj2 = obj2;
    size_t sz1 = obj1_sz;
    size_t sz2 = obj2_sz;

    if (obj1_sz < obj2_sz) {
            pobj1 = obj2;
            pobj2 = obj1;
            sz1 = obj2_sz;
            sz2 = obj1_sz;
    }

    for (size_t i = 0; i < sz1; ++i) {
            if ((pobj1 + (i * type_sz) == pobj2) ||
                (pobj1 + (i * type_sz) == pobj2 + ((sz2 - 1) * type_sz))) {
                    return 1;
            }
    }
    return 0;
}

DrSheldon · Answer 3 · 2022-12-30T21:47:26.507

Not in a portable way. There are several false negatives.

Counterexample #1: Memory aliasing

It is unusual for a device (e.g. RAM, ROM, or memory-mapped I/O) to use all of the address pins coming out of the processor. Typically, whatever number of address lines are needed by the device are connected to the lowest-order address lines of the processor, the highest address lines are used to select the device, and the address lines in between are not connected:

MSB -------- Address bus -------- LSB
| | ... | |  x x ... x x  | | ... | |
chip select  unconnected   to device

Such a device can be addressed as a block in the address space. However, the device also appears as several other blocks in the address space; each of these blocks physically point to the same locations on the device! The effect is called memory aliasing, and is much more common than you may realize.

For example, imagine a system with 16-bit addresses. Perhaps the top 4 address lines are used to select which chip is being addressed. Suppose we have a device assigned to A15:A12 == 0xE. Furthermore, this device only has 8 address lines coming out of it, so we connect those to A7:A0.

This device appears as addresses 0xE000 through 0xE0FF. However, it also appears at 0xE100 through 0xE1FF. Indeed, it appears 16 times in the address space, at any block 0xEz00 through 0xEzFF. Worse, each of these blocks physically point to the same thing. An access to 0xE123 is the same as an access to 0xE223, 0xE323, 0xE423, and so on.

So you can have two objects in memory that seem to point to different areas of memory, but in fact actually point to the same thing:

char *x = (char *)0xE000;
char *y = (char *)0xE300;
if (overlap(x, y, 16)) { ... }

A naive implementation of overlap() would report these as two different objects. But they are the same object; writing to x[] changes y[]. Therefore, in this case you will get a false negative. A correct implementation of overlap() would require and depend upon full knowledge of the system's memory map, making such a function completely non-portable.

Counterexample #2: Shared memory

Suppose x and y are overlapping objects in process A. We then use the operating system to create shared memory between process A and process B. Specifically, xx is a shared memory pointer in process B that points to x, and yy is a shared memory pointer in process B that points to y.

Back in process A, it's not hard to write a function that determines that x and y indeed overlap.

But depending on the operating system, pointers xx and yy in process B may look nothing like overlapping objects. But in reality, they do indeed point to overlapping objects. So you will get a false negative.

Is it theoretically possible to write a function that checks across processes for overlap? Probably, but keep in mind that I can make the problem even more difficult. I can create subsets of xx and yy that still overlap; I can share memory from process B to a third process; and so on. In any case, any such solution is not portable.

Counterexample #3: 8086 far pointers

The 8086 architecture on the original IBM PC used a type of memory mapping called "segmentation". A 16-bit register called the "segment" was multiplied by 16 and then added to another 16-bit register with the "base address" to obtain the 20-bit physical address.

Programs needing less than 64k of memory could get away with just the 16-bit base addresses, called "near pointers". But programs that needed more than 64k of memory had to maintain 32-bit "far pointers" which contained both the segment and the base address.

Because of the pointer arithmetic of segmentation, it is quite easy to make two far pointers that seem to be quite different, yet point to the same object:

far char *x = (far char *)0x12340005L;
far char *y = (far char *)0x10002345L;

In this case, x and y both point to the same physical address 0x12345, even though they are very different bit patterns.

Some compilers would treat x == y as false because they have different bit patterns. Other compilers would do the math (with a performance penalty) and return true. Yet other compilers let you choose either behavior with a command-line switch or #pragma.

The OP complains that these examples represent compilers that are not "standard-conforming". The argument is that if two pointers actually do point to the same object, then the standard says they must compare ==.

If you're going to be such a language-lawyer, then no compiler has even conformed to the standard. Not gcc, not Microsoft C (two compilers proud of their conformance). Basically every system that has had a C compiler has had some degree of memory aliasing (counterexample #1). So every C compiler is guilty of allowing two != pointers point to the same thing.

On the other hand, if you interpret the standard by its intended meaning instead of its literal meaning, then those compilers do conform to the standard.

Sure, these are edge cases. Most programs are in user space, where #1 is hidden away. Few programs use shared memory (#2). And no one likes programming in a segmented memory model (#3). But exceptions like these are why the standard has so many instances of undefined behavior; many things which work in one case cannot be made to work that way on other cases.

If pointers can be equal and reference the same array then implementation is not conforming and any standard related deliberations make no sense — 0___________, Dec 30 '22 at 20:15
@0___________: Not sure what your comment meant. In each of my counterexamples, there are two pointers which reference the same (or at least overlapping) array, yet are *not equal*. — DrSheldon, Dec 30 '22 at 20:27
Then the C compiler used is not conforming. Two pointers have to be equal if they reference the same element of an array. If in your implementation they are not equal, then your implementation is not conforming. So your examples are wrong considering conforming C implementations. — 0___________, Dec 30 '22 at 20:31
@0___________ I don't think that this has anything to do with the compiler. How would a compiler know, what address pins a particular PCB layout uses? — Marco, Dec 30 '22 at 22:43
Even if multiple bit patterns could represent addresses that identify the same storage, only one of them at a time could represent a pointer to the start of any particular object, and only of of them at a time could identify the end of any particular object. — supercat, Dec 31 '22 at 01:38
The catch is that the only way to created aliased or shared memory is via mechanisms not covered by the standard. All objects created in standard-conforming ways will behave correctly with respect to `==`. Objects created outside the standard are naturally not covered by the standard. Implementations are careful to ensure that objects *that they themselves create* behave correctly. If you start creating objects in nonstandard ways, then the implementation is not obligated to handle them in standard ways. — Raymond Chen, Jan 01 '23 at 19:34

score 3 · Answer 4 · edited Feb 16 '23 at 08:13

3

Well, if we're going to be language-lawyering, I raise you this:

// SPDX-License-Identifier: CC0-1.0

#include <stddef.h>
#include <stdbool.h>
#include <stdint.h>

bool overlap(const void *p1, size_t s1, const void *p2, size_t s2)
{
    const uintptr_t p1b = (uintptr_t) p1;
    const uintptr_t p2b = (uintptr_t) p2;
    const uintptr_t p1e = (uintptr_t) ((char*) p1 + (s1 - 1));
    const uintptr_t p2e = (uintptr_t) ((char*) p2 + (s2 - 1));
    return (p1b <= p2b && p2b <= p1e)
        || (p2b <= p1b && p1b <= p2e);
}

This code invokes implementation-defined behavior, not undefined behavior.[1] Obviously, this is by no means portable, but in most cases this should work.

[1]: ISO/IEC 9899:2018, § 6.3.2.3, par. 6 ("Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined.").

edited Feb 16 '23 at 08:13

Unrelated String

188
2
9

answered Feb 15 '23 at 15:44

nfitzen

31
2

it does if they overlap as you cant compare pointers (or converted to integers) if they do not reference the same array. – 0___________ Feb 15 '23 at 18:32
2

The C standard makes comparing pointers UB, but comparing integers is not. Would a conversion not affect that? – nfitzen Feb 15 '23 at 20:09
it does, otherwise the point about comparing pointers would not make any sense. You can't compare poiners but you can compare pointers converted to integers. Then why can't you compare pointers? – 0___________ Feb 15 '23 at 20:17
2

Because comparing pointers isn't exactly portable, and sometimes integer conversion won't have any semantic meaning. Making a core feature -- comparing arbitrary pointers -- optional like that would be absurd for a standard. – nfitzen Mar 13 '23 at 14:18

score 2 · Answer 5 · answered Dec 30 '22 at 14:42

2

Well, since you didn't say anything about preserving data:

#include <stdbool.h>
#include <stddef.h>
#include <string.h>

bool overlaps(void* p1, void* p2, size_t sz1, size_t sz2) {
    if (!p1 || !p2 || !sz1 || !sz2) return false; /* empty ranges ignored */

    memset(p1, 0, sz1);
    memset(p2, 1, sz2);

    return !!memchr(p1, 1, sz1);
}

This is completely well defined.

answered Dec 30 '22 at 14:42

Aykhan Hagverdili

28,141
6
41
93

2

not every array is modifiable. UB -> `overlaps("123456", "123", 7,4);` – 0___________ Dec 30 '22 at 14:59
You did not mention this as a requirement. You can have any function cause UB by violating its contract. – Aykhan Hagverdili Dec 30 '22 at 15:09
Your answer is simply not correct. The method has to be universal. – 0___________ Dec 30 '22 at 17:20
1

@0___________ Where in your question did you mention it has to work with immutable arrays? Your requirements were (1) detect if arrays overlap and (2) don't cause any undefined behavior. This answer perfectly satisfies both your requirements for mutable arrays. All functions work within a contract. – Aykhan Hagverdili Dec 30 '22 at 17:38
1

Very simple - I did not mention anything so it **has** to work with **any** array. – 0___________ Dec 30 '22 at 18:22
6

This answer is a case of malicious compliance. It's like if somebody asks you to help them open a jar of pickles, and you solve the problem by smashing the jar on the ground. – Raymond Chen Dec 30 '22 at 19:30
2

It may be a weird answer, but I like it very much: it is unexpected, and thinking outside of the box. It should be easy to extend it, so that the original data is preserved (in temporary arrays) and later restored, if that is needed. – Thomas Mueller Jan 20 '23 at 13:41

score 2 · Answer 6 · answered Dec 31 '22 at 01:56

In the language the Standard was written to describe, it would be possible to use the equality comparison operator to check the starting address of each object with every possible address within the other. If the objects overlap, one such comparison should report a match.

In the language processed by clang and gcc, however, the equality comparison operator may only be used with two pointers that each identify a byte in some object, or with two pointers that each point just past the last byte of some object, or with a null pointer and a pointer of either of the above categories. Using it with one pointer from each of the first two categories is not allowed.

The inability of clang and gcc to reliably handle corner cases involving comparisons between pointers of the first two categories has been entered on both compilers' bug reporting systems years ago; the fact that both compilers continue to make "optimizations" that break in such cases implies that their maintainers believe the language forbids such comparisons and imposes no requirements whatsoever on the behavior of any program that performs them.

gnasher729 · Answer 7 · 2022-12-29T15:44:37.007

You can check in linear time whether &obj1[i] == &obj2[0] for some i, or &obj1[0] == &obj2[i] for some i and determine this way if there is overlap or not.

Before you do that, you cast obj1 and obj2 to uintptr_t, assume (without evidence) that pointers cast to uintptr_t behave similar to char*, and calculate i, j so that &obj1[i] should equal &obj2[j] according to your assumptions, and both indices are valid. Since comparing unrelated pointers for equality or inequality doesn't invoke UB you might be able to prove that the arrays are overlapping this way. If your implementation is weird then this doesn't help, but also won't give you wrong results. And if the arrays don't overlap, it doesn't work either. In those case you go back to the first method.

score 0 · Answer 8 · answered Jan 04 '23 at 08:27

The problem maybe more complex, when these objects have other (and different) objects as members (subobjects) which also may overlap. Like an array of strings.

Your overlap problem is more a program logic problem, because every object should have its own memory or some shared data from a data store, which than no one own. Depending on the problem, you can also use an additional memory struct array which is maintaining all start and end addresses of the components and than you only are comparing addresses.

This question s not related to any real-life usage. `language-lawyer` tag shows that it is strictly language level academic question — 0___________, Jan 04 '23 at 13:00

Is it possible in C (not invoking UB) to check if two objects overlap?

8 Answers8

Counterexample #1: Memory aliasing

Counterexample #2: Shared memory

Counterexample #3: 8086 far pointers

Linked

Related